Mutated thermostable nucleic acid polymerase enzyme from thermotoga maritima

ABSTRACT

A purified thermostable enzyme is derived from the eubacterium Thermotoga maritima. The enzyme has a molecular weight as determined by gel electrophoresis of about 97 kilodaltons and DNA polymerase I activity. The enzyme can be produced from native or recombinant host cells and can be used with primers and nucleoside triphosphates in a temperature-cycling chain reaction where at least one nucleic acid sequence is amplified in quantity from an existing sequence.

This application is a continuation-in-part of U.S. Ser. No. 07/567,244,filed Aug. 13, 1990, now U.S. Pat. No. 5,374,553.

TECHNICAL FIELD

The present invention relates to a purified, thermostable DNA polymerasepurified from the hyperthermophilic eubactefia Thermotoga maritima andmeans for isolating and producing the enzyme. Thermostable DNApolymerases are useful in many recombinant DNA techniques, especiallynucleic acid amplification by the polymerase chain reaction (PCR).

BACKGROUND ART

In Huber et al., 1986, Arch. Microbiol. 144:324-333, the isolation ofthe bacterium Thermotoga maritima is described. T. maritima is aeubacterium that is strictly anaerobic, rod-shaped, fermentative,hyperthermophilic, and grows between 55° C. and 90° C., with an optimumgrowth temperature of about 80° C. This eubacterium has been isolatedfrom geothermally heated sea floors in Italy and the Azores. T. maritimacells have a sheath-like structure and monotrichous flagellation. T.maritima is classified in the eubacterial kingdom by virtue of havingmurein and fatty acid-containing lipids, diphtheria-toxin-resistantelongation factor 2, an RNA polymerase subunit pattern, and sensitivityto antibiotics.

Extensive research has been conducted on the isolation of DNApolymerases from mesophilic microorganisms such as E. coli. See, forexample, Bessman et al., 1957, i. Bio. Chem. 223:171-177, and Buttin andKornberg, 1966, J. Biol. Chem. 241:5419-5427. Much less investigationhas been made on the isolation and purification of DNA polymerases fromthermophiles such as Thermotoga maritima. In Kaledin et al., 1980,Biokhymiya 45:644-651, a six-step isolation and enrichment procedure forDNA polymerase activity from cells of a Thermus aquaticus YT-1 strain isdisclosed. These steps involve isolation of crude extract,DEAE-cellulose chromatography, fractionation on hydroxyapatite,fractionation on DEAE-cellulose, and chromatography on single-strandDNA-cellulose. The molecular weight of the purified enzyme is reportedby Kaledin et al. as 62,000 daltons per monomeric unit.

A second enrichment scheme for a polymerase from Thermus aquaticus isdescribed in Chien et al., 1976, J. Bacteriol. 127:1550-1557. In thisprocess, the crude extract is applied to a DEAE-Sephadex column. Thedialyzed pooled fractions are then subjected to treatment on aphosphocellulose column. The pooled fractions are dialyzed, and bovineserum albumin (BSA) is added to prevent loss of polymerase activity. Theresulting mixture is loaded on a DNA-cellulose column. The pooledmaterial from the column is dialyzed. The molecular weight of thepurified protein is reported to be about 63,000 daltons to 68,000daltons.

The use of thermostable enzymes, such as those described in Chien et al.and Kaledin et al., to amplify existing nucleic acid sequences inamounts that are large compared to the amount initially present isdescribed in U.S. Pat. Nos. 4,683,195; 4,683,202; and 4,965,188, whichdescribe the PCR process, each of which is incorporated herein byreference. Primers, template, nucleoside triphosphates, the appropriatebuffer and reaction conditions, and polymerase are used in the PCRprocess, which involves denaturation of target DNA, hybridization ofprimers, and synthesis of complementary strands. The extension productof each primer becomes a template for the production of the desirednucleic acid sequence. The patents disclose that, if the polymeraseemployed is a thermostable enzyme, then polymerase need not be addedafter every denaturation step, because heat will not destroy thepolymerase activity.

U.S. Pat. No. 4,889,818, European Patent Publication No. 258,017, andPCT Publication No. 89/06691, the disclosures of which are incorporatedherein by reference, describe the isolation and recombinant expressionof an ˜94 kDa thermostable DNA polymerase from Thermus aquaticus and theuse of that polymerase in PCR. Although T. aquaticus DNA polymerase isespecially preferred for use in PCR and other recombinant DNAtechniques, there remains a need for other thermostable polymerases.

Accordingly, there is a desire in the art to produce a purified,thermostable DNA polymerase that may be used to improve the PCR processdescribed above and to improve the results obtained when using athermostable DNA polymerase in other recombinant techniques, such as DNAsequencing, nick-translation, and even reverse transcription. Thepresent invention helps meet that need by providing recombinantexpression vectors and purification protocols for Thermotoga maritimaDNA polymerase.

DISCLOSURE OF INVENTION

The present invention provides a purified thermostable DNA polymeraseenzyme that catalyzes combination of nucleoside triphosphates to form anucleic acid strand complementary to a nucleic acid template strand. Thepurified enzyme is the DNA polymerase from Thermotoga maritima (Tma) andhas a molecular weight of about 97 kilodaltons (kDa) as measured bySDS-PAGE and an inferred molecular weight, from the nucleotide sequenceof the Tma DNA polymerase gene, of 102 kDa. This purified material maybe used in PCR to produce a given nucleic acid sequence in amounts thatare large compared to the amount initially present so that the sequencescan be manipulated and/or analyzed easily.

The gene encoding Tma DNA polymerase enzyme from Thermotoga maritima hasalso been identified, cloned, sequenced, and expressed at high level andprovides yet another means to prepare the thermostable enzyme of thepresent invention. In addition to the intact gene and the codingsequence for the Tma enzyme, derivatives of the coding sequence for TmaDNA polymerase are also provided.

The invention also encompasses a stable enzyme composition comprising apurified, thermostable Tma enzyme as described above in a buffercontaining one or more non-ionic polymeric detergents.

Finally, the invention provides a method of purification for thethermostable polymerase of the invention. This method involves preparinga crude extract from Thermotoga maritima cells, adjusting the ionicstrength of the crude extract so that the DNA polymerase dissociatesfrom nucleic acid in the extract, subjecting the extract to hydrophobicinteraction chromatography, subjecting the extract to DNA bindingprotein affinity chromatography, and subjecting the extract to cation oranion exchange or hydroxyapatite chromatography. In a preferredembodiment, these steps are performed sequentially in the order givenabove. The nucleotide binding protein affinity chromatography step ispreferred for separating the DNA polymerase from endonuclease proteins.

MODES FOR CARRYING OUT THE INVENTION

The present invention provides DNA sequences and expression vectors thatencode Tma DNA polymerase, purification protocols for Tma DNApolymerase, preparations of purified Tma DNA polymerase, and methods forusing Tma DNA polymerase. To facilitate understanding of the invention,a number of terms are defined below.

The terms "cell," "cell line," and "cell culture" are usedinterchangeably and all such designations include progeny. Thus, thewords "transformants" or "transformed cells" include the primarytransformed cell and cultures derived from that cell without regard tothe number of transfers. All progeny may not be precisely identical inDNA content, due to deliberate or inadvertent mutations. Mutant progenythat have the same functionality as screened for in the originallytransformed cell are included in the definition of transformants.

The term "control sequences" refers to DNA sequences necessary for theexpression of an operably linked coding sequence in a particular hostorganism. The control sequences that are suitable for procaryotes, forexample, include a promoter, optionally an operator sequence, a ribosomebinding site, and possibly other sequences, such as transcriptiontermination sequences. Eucaryotic cells are known to utilize promoters,polyadenylation signals, and enhancers.

The term "expression system" refers to DNA sequences containing adesired coding sequence and control sequences in operable linkage, sothat hosts transformed with these sequences are capable of producing theencoded proteins. To effect transformation, the expression system may beincluded on a vector; the relevant DNA can also be integrated into thehost chromosome.

The term "gene" refers to a DNA sequence that codes for the expressionof a recoverable bioactive polypeptide or precursor. Thus, the Tma DNApolymerase gene includes the promoter and Tma DNA polymerase codingsequence. The polypeptide can be encoded by a full-length codingsequence or by any portion of the coding sequence so long as the desiredenzymatic activity is retained.

The term "operably linked" refers to the positioning of the codingsequence such that control sequences will function to drive expressionof the encoded protein. Thus, a coding sequence "operably linked" to acontrol sequence refers to a configuration wherein the coding sequencecan be expressed under the direction of the control sequence.

The term "mixture" as it relates to mixtures containing Tma polymeraserefers to a collection of materials that includes Tma polymerase but canalso include other proteins. If the Tma polymerase is derived fromrecombinant host cells, the other proteins will ordinarily be thoseassociated with the host. Where the host is bacterial, the contaminatingproteins will be bacterial proteins.

The term "non-ionic polymeric detergents" refers to surface-activeagents that have no ionic charge and that are characterized, forpurposes of this invention, by an ability to stabilize the Tma enzyme ata pH range of from about 3.5 to about 9.5, preferably from 4 to 8.5.Numerous examples of suitable non-ionic polymeric detergents arepresented in copending U.S. patent application Ser. No. 387,003, friedJul. 28, 1989, the disclosure of which is incorporated herein byreference.

The term "oligonucleotide" as used herein is defined as a moleculecomprised of two or more deoxyribonucleotides or ribonucleotides,preferably more than three, and usually more than ten. The exact sizewill depend on many factors, which in turn depend on the ultimatefunction or use of the oligonucleotide. The oligonucleotide may bederived synthetically or by cloning.

The term "primer" as used herein refers to an oligonucleotide that iscapable of acting as a point of initiation of synthesis when placedunder conditions in which primer extension is initiated. Anoligonucleotide "primer" may occur naturally, as in a purifiedrestriction digest, or be produced synthetically. Synthesis of a primerextension product that is complementary to a nucleic acid strand isinitiated in the presence of four different nucleoside triphosphates andthe Tma thermostable enzyme in an appropriate buffer at a suitabletemperature. A "buffer" includes cofactors (such as divalent metal ions)and salt (to provide the appropriate ionic strength),.adjusted to thedesired pH. For Tma polymerase, the buffer preferably contains 1 to 3 mMof a magnesium salt, preferably MgCl₂, 50 to 200 μM of each nucleosidetriphosphate, and 0.2 to 1 μM of each primer, along with 50 mM KCl, 10mM Tris buffer (pH 8.0-8.4), and 100 μg/ml gelatin (although gelatin isnot required and should be avoided in some applications, such as DNAsequencing).

The primer is single-stranded for maximum efficiency in amplification,but may alternatively be double-stranded. If double-stranded, the primeris fast treated to separate its strands before being used to prepareextension products. The primer is usually an oligodeoxyribonucleotide.The primer must be sufficiently long to prime the synthesis of extensionproducts in the presence of the polymerase enzyme. The exact length of aprimer will depend on many factors, such as source of primer and resultdesired, and the reaction temperature must be adjusted depending onprimer length to ensure proper annealing of primer to template.Depending on the complexity of the target sequence, the oligonucleotideprimer typically contains 15 to 35 nucleotides. Short primer moleculesgenerally require cooler temperatures to form sufficiently stablecomplexes with template.

A primer is selected to be "substantially" complementary to a strand ofspecific sequence of the template. A primer must be sufficientlycomplementary to hybridize with a template strand for primer elongationto occur. A primer sequence need not reflect the exact sequence of thetemplate. For example, a non-complementary nucleotide fragment may beattached to the 5' end of the primer, with the remainder of the primersequence being substantially complementary to the strand.Non-complementary bases or longer sequences can be interspersed into theprimer, provided that the primer sequence has sufficient complementaritywith the sequence of the template to hybridize and thereby form atemplate/primer complex for synthesis of the extension product of theprimer.

The terms "restriction endonucleases" and "restriction enzymes" refer tobacterial enzymes that cut double-stranded DNA at or near a specificnucleotide sequence.

The term "thermostable enzyme" refers to an enzyme which is stable toheat and is heat resistant and catalyzes (facilitates) combination ofthe nucleotides in the proper manner to form primer extension productsthat are complementary to a nucleic acid strand. Generally, synthesis ofa primer extension product begins at the 3' end of the primer andproceeds towards the 5' end of the template strand until synthesisterminates. A thermostable enzyme must be able to renature and regainactivity after brief (i.e., 5 to 30 seconds) exposure to temperatures of80° C. to 105° C. and must have a temperature optimum of above 60° C.

The Tma thermostable DNA polymerase enzyme of the present inventionsatisfies the requirements for effective use in the amplificationreaction known as the polymerase chain reaction or PCR. The Tma DNApolymerase enzyme does not become irreversibly denatured (inactivated)when subjected to the elevated temperatures for the time necessary toeffect denaturation of double-stranded nucleic acids, a key step in thePCR process. Irreversible denaturation of an enzyme for purposes hereinrefers to permanent and complete loss of enzymatic activity.

The heating conditions necessary to effect nucleic acid denaturationwill depend, e.g., on the buffer salt concentration and the composition,length, and amount of the nucleic acids being denatured, but typicallythe denaturation temperature ranges from about 80° C. to about 105° C.for a few seconds to minutes. Higher temperatures may be required fornucleic acid denaturation as the buffer salt concentration and/or GCcomposition of the nucleic acid is increased. The Tma enzyme does notbecome irreversibly denatured upon relatively short exposures totemperatures of about 80°-105° C.

The Tma thermostable enzyme has an optimum temperature at which itfunctions that is higher than about 60° C. Temperatures below 60° C.facilitate hybridization of primer to template, but depending on saltcomposition and concentration and primer composition and length,hybridization of primer to template can occur at higher temperatures(e.g., 60°-80° C.), which may promote specificity of the primerelongation reaction. The higher the temperature optimum for the enzyme,the greater the specificity and/or selectivity of the primer-directedextension process. The Tma enzyme exhibits activity over a broadtemperature range from about 45° C. to 90° C.; a preferred optimumtemperature is 75°-80° C.

The present invention also provides DNA sequences encoding thethermostable DNA polymerase activity of Thermotoga maritima. The aminoacid sequence encoded by this sequence has homology to portions of thethermostable DNA polymerases of Thermus aquaticus and Thermusthermophilus. The complete coding sequence, from the 5'-ATG start codonto the TGA-3' stop codon, of the Tma DNA polymerase gene is depictedbelow and listed as SEQ ID NO: 1 in the sequence listing section. Thesequence is numbered for reference.

    __________________________________________________________________________    1  ATGGCGAGAC                                                                              TATTTCTCTT                                                                              TGATGGAACT                                                                              GCTCTGGCCT                                                                              ACAGAGCGTA                         51 CTATGCGCTC                                                                              GATAGATCGC                                                                              TTTCTACTTC                                                                              CACCGGCATT                                                                              CCCACAAACG                         101                                                                              CCACATACGG                                                                              TGTGGCGAGG                                                                              ATGCTGGTGA                                                                              GATTCATCAA                                                                              AGACCATATC                         151                                                                              ATTGTCGGAA                                                                              AAGACTACGT                                                                              TGCTGTGGCT                                                                              TTCGACAAAA                                                                              AAGCTGCCAC                         201                                                                              CTTCAGACAC                                                                              AAGCTCCTCG                                                                              AGACTTACAA                                                                              GGCTCAAAGA                                                                              CCAAAGACTC                         251                                                                              CGGATCTCCT                                                                              GATTCAGCAG                                                                              CTTCCGTACA                                                                              TAAAGAAGCT                                                                              GGTCGAAGCC                         301                                                                              CTTGGAATGA                                                                              AAGTGCTGGA                                                                              GGTAGAAGGA                                                                              TACGAAGCGG                                                                              ACGATATAAT                         351                                                                              TGCCACTCTG                                                                              GCTGTGAAGG                                                                              GGCTTCCGCT                                                                              TTTTGATGAA                                                                              ATATTCATAG                         401                                                                              TGACCGGAGA                                                                              TAAAGACATG                                                                              CTTCAGCTTG                                                                              TGAACGAAAA                                                                              GATCAAGGTG                         451                                                                              TGGCGAATCG                                                                              TAAAAGGGAT                                                                              ATCCGATCTG                                                                              GAACTTTACG                                                                              ATGCGCAGAA                         501                                                                              GGTGAAGGAA                                                                              AAATACGGTG                                                                              TTGAACCCCA                                                                              GCAGATCCCG                                                                              GATCTTCTGG                         551                                                                              CTCTAACCGG                                                                              AGATGAAATA                                                                              GACAACATCC                                                                              CCGGTGTAAC                                                                              TGGGATAGGT                         601                                                                              GAAAAGACTG                                                                              CTGTTCAGCT                                                                              TCTAGAGAAG                                                                              TACAAAGACC                                                                              TCGAAGACAT                         651                                                                              ACTGAATCAT                                                                              GTTCGCGAAC                                                                              TTCCTCAAAA                                                                              GGTGAGAAAA                                                                              GCCCTGCTTC                         701                                                                              GAGACAGAGA                                                                              AAACGCCATT                                                                              CTCAGCAAAA                                                                              AGCTGGCGAT                                                                              TCTGGAAACA                         751                                                                              AACGTTCCCA                                                                              TTGAAATAAA                                                                              CTGGGAAGAA                                                                              CTTCGCTACC                                                                              AGGGCTACGA                         801                                                                              CAGAGAGAAA                                                                              CTCTTACCAC                                                                              TTTTGAAAGA                                                                              ACTGGAATTC                                                                              GCATCCATCA                         851                                                                              TGAAGGAACT                                                                              TCAACTGTAC                                                                              GAAGAGTCCG                                                                              AACCCGTTGG                                                                              ATACAGAATA                         901                                                                              GTGAAAGACC                                                                              TAGTGGAATT                                                                              TGAAAAACTC                                                                              ATAGAGAAAC                                                                              TGAGAGAATC                         951                                                                              CCCTTCGTTC                                                                              GCCATAGATC                                                                              TTGAGACGTC                                                                              TTCCCTCGAT                                                                              CCTTTCGACT                         1001                                                                             GCGACATTGT                                                                              CGGTATCTCT                                                                              GTGTCTTTCA                                                                              AACCAAAGGA                                                                              AGCGTACTAC                         1051                                                                             ATACCACTCC                                                                              ATCATAGAAA                                                                              CGCCCAGAAC                                                                              CTGGACGAAA                                                                              AAGAGGTTCT                         1101                                                                             GAAAAAGCTC                                                                              AAAGAAATTC                                                                              TGGAGGACCC                                                                              CGGAGCAAAG                                                                              ATCGTTGGTC                         1151                                                                             AGAATTTGAA                                                                              ATTCGATTAC                                                                              AAGGTGTTGA                                                                              TGGTGAAGGG                                                                              TGTTGAACCT                         1201                                                                             GTTCCTCCTT                                                                              ACTTCGACAC                                                                              GATGATAGCG                                                                              GCTTACCTTC                                                                              TTGAGCCGAA                         1251                                                                             CGAAAAGAAG                                                                              TTCAATCTGG                                                                              ACGATCTCGC                                                                              ATTGAAATTT                                                                              CTTGGATACA                         1301                                                                             AAATGACATC                                                                              TTACCAAGAG                                                                              CTCATGTCCT                                                                              TCTCTTTTCC                                                                              GCTGTTTGGT                         1351                                                                             TTCAGTTTTG                                                                              CCGATGTTCC                                                                              TGTAGAAAAA                                                                              GCAGCGAACT                                                                              ACTCCTGTGA                         1401                                                                             AGATGCAGAC                                                                              ATCACCTACA                                                                              GACTTTACAA                                                                              GACCCTGAGC                                                                              TTAAAACTCC                         1451                                                                             ACGAGGCAGA                                                                              TCTGGAAAAC                                                                              GTGTTCTACA                                                                              AGATAGAAAT                                                                              GCCCCTTGTG                         1501                                                                             AACGTGCTTG                                                                              CACGGATGGA                                                                              ACTGAACGGT                                                                              GTGTATGTGG                                                                              ACACAGAGTT                         1551                                                                             CCTGAAGAAA                                                                              CTCTCAGAAG                                                                              AGTACGGAAA                                                                              AAAACTCGAA                                                                              GAACTGGCAG                         1601                                                                             AGGAAATATA                                                                              CAGGATAGCT                                                                              GGAGAGCCGT                                                                              TCAACATAAA                                                                              CTCACCGAAG                         1651                                                                             CAGGTTTCAA                                                                              GGATCCTTTT                                                                              TGAAAAACTC                                                                              GGCATAAAAC                                                                              CACGTGGTAA                         1701                                                                             AACGACGAAA                                                                              ACGGGAGACT                                                                              ATTCAACACG                                                                              CATAGAAGTC                                                                              CTCGAGGAAC                         1751                                                                             TTGCCGGTGA                                                                              ACACGAAATC                                                                              ATTCCTCTGA                                                                              TTCTTGAATA                                                                              CAGAAAGATA                         1801                                                                             CAGAAATTGA                                                                              AATCAACCTA                                                                              CATAGACGCT                                                                              CTTCCCAAGA                                                                              TGGTCAACCC                         1851                                                                             AAAGACCGGA                                                                              AGGATTCATG                                                                              CTTCTTTCAA                                                                              TCAAACGGGG                                                                              ACTGCCACTG                         1901                                                                             GAAGACTTAG                                                                              CAGCAGCGAT                                                                              CCCAATCTTC                                                                              AGAACCTCCC                                                                              GACGAAAAGT                         1951                                                                             GAAGAGGGAA                                                                              AAGAAATCAG                                                                              GAAAGCGATA                                                                              GTTCCTCAGG                                                                              ATCCAAACTG                         2001                                                                             GTGGATCGTC                                                                              AGTGCCGACT                                                                              ACTCCCAAAT                                                                              AGAACTGAGG                                                                              ATCCTCGCCC                         2051                                                                             ATCTCAGTGG                                                                              TGATGAGAAT                                                                              CTTTTGAGGG                                                                              CATTCGAAGA                                                                              GGGCATCGAC                         2101                                                                             GTCCACACTC                                                                              TAACAGCTTC                                                                              CAGAATATTC                                                                              AACGTGAAAC                                                                              CCGAAGAAGT                         2151                                                                             AACCGAAGAA                                                                              ATGCGCCGCG                                                                              CTGGTAAAAT                                                                              GGTTAATTTT                                                                              TCCATCATAT                         2201                                                                             ACGGTGTAAC                                                                              ACCTTACGGT                                                                              CTGTCTGTGA                                                                              GGCTTGGAGT                                                                              ACCTGTGAAA                         2251                                                                             GAAGCAGAAA                                                                              AGATGATCGT                                                                              CAACTACTTC                                                                              GTCCTCTACC                                                                              CAAAGGTGCG                         2301                                                                             CGATTACATT                                                                              CAGAGGGTCG                                                                              TATCGGAAGC                                                                              GAAAGAAAAA                                                                              GGCTATGTTA                         2351                                                                             GAACGCTGTT                                                                              TGGAAGAAAA                                                                              AGAGACATAC                                                                              CACAGCTCAT                                                                              GGCCCGGGAC                         2401                                                                             AGGAACACAC                                                                              AGGCTGAAGG                                                                              AGAACGAATT                                                                              GCCATAAACA                                                                              CTCCCATACA                         2451                                                                             GGGTACAGCA                                                                              GCGGATATAA                                                                              TAAAGCTGGC                                                                              TATGATAGALA                                                                             ATAGACAGGG                         2501                                                                             AACTGAAAGA                                                                              AAGAAAAATG                                                                              AGATCGAAGA                                                                              TGATCATACA                                                                              GGTCCACGAC                         2551                                                                             GAACTGGTTT                                                                              TTGAAGTGCC                                                                              CAATGAGGAA                                                                              AAGGACGCGC                                                                              TCGTCGAGCT                         2601                                                                             GGTGAAAGAC                                                                              AGAATGACGA                                                                              ATGTGGTAAA                                                                              GCTTTCAGTG                                                                              CCGCTCGAAG                         2651                                                                             TGGATGTAAC                                                                              CATCGGCAAA                                                                              ACATGGTCGT                                                                              GA                                           __________________________________________________________________________

Both the complete coding sequence of the Tma, DNA polymerase gene andthe encoded amino acid sequence in three letter abbreviation areprovided in the Sequence Listing section as SEQ ID NO: 1. Forconvenience, the amino acid sequence encoded by the Tma DNA polymerasegene sequence is also depicted below in one letter abbreviation fromamino-terminus to carboxy-terminus; the sequence is numbered forreference.

    __________________________________________________________________________    1  MARLFLFDGT                                                                             ALAYRAYYAL                                                                             DRSLSTSTGI                                                                              PTNATYGVAR                                                                             MLVRFIKDHI                            51 IVGKDYVAVA                                                                             FDKKAATFRH                                                                             KLLETYKAQR                                                                              PKTPDLLIQQ                                                                             LPYIKKLVEA                            101                                                                              LGMKVLEVEG                                                                             YEADDIIATL                                                                             AVKGLPLFDE                                                                              IFIVTGDKDM                                                                             LQLVNEKIKV                            151                                                                              WRIVKGISDL                                                                             ELYDAQKVKE                                                                             KYGVEPQQIP                                                                              DLLALTGDEI                                                                             DNIPGVTGIG                            201                                                                              EKTAVQLLEK                                                                             YKDLEDILNH                                                                             VRELPQKVRK                                                                              ALLRDRENAI                                                                             LSKKLAILET                            251                                                                              NVPIEINWEE                                                                             LRYQGYDREK                                                                             LLPLLKELEF                                                                              ASIMKELQLY                                                                             EESEPVGYRI                            301                                                                              VKDLVEFEKL                                                                             IEKLRESPSF                                                                             AIDLETSSLD                                                                              PFDCDIVGIS                                                                             VSFKPKEAYY                            351                                                                              IPLHHRNAQN                                                                             LDEKEVLKKL                                                                             KEILEDPGAK                                                                              IVGQNLKFDY                                                                             KVLMVKGVEP                            401                                                                              VPPYFDTMIA                                                                             AYLLEPNEKK                                                                             FNLDDLALKF                                                                              LGYKMTSYQE                                                                             LMSFSFPLFG                            451                                                                              FSFADVPVEK                                                                             AANYSCEDAD                                                                             ITYRLYKTLS                                                                              LKLHEADLEN                                                                             VFYKIEMPLV                            501                                                                              NVLARMELNG                                                                             VYVDTEFLKK                                                                             LSEEYGKKLE                                                                              ELAEEIYRIA                                                                             GEPFNINSPK                            551                                                                              QVSRILFEKL                                                                             GIKPRGKTTK                                                                             TGDYSTRIEV                                                                              LEELAGEHEI                                                                             IPLILEYRKI                            601                                                                              QKLKSTYIDA                                                                             LPKMVNPKTG                                                                             RIHASFNQTG                                                                              TATGRLSSSD                                                                             PNLQNLPTKS                            651                                                                              EEGKEIRKAI                                                                             VPQDPNWWIV                                                                             SADYSQIELR                                                                              ILAHLSGDEN                                                                             LLRAFEEGID                            701                                                                              VHTLTASRIF                                                                             NVKPEEVTEE                                                                             MRRAGKMVNF                                                                              SIIYGVTPYG                                                                             LSVRLGVPVK                            751                                                                              EAEKMIVNYF                                                                             VLYPKVRDYI                                                                             QRVVSEAKEK                                                                              GYVRTLFGRK                                                                             RDIPQLMARD                            801                                                                              RNTQAEGERI                                                                             AINTPIQGTA                                                                             ADIIKLAMIE                                                                              IDRELKERKM                                                                             RSKMIIQVHD                            851                                                                              ELVFEVPNEE                                                                             KDALVELVKD                                                                             RMTNVVKLSV                                                                              PLEVDVTIGK                                                                             TWS                                   __________________________________________________________________________

The one letter abbreviations for the amino acids are shown below forconvenience.

    ______________________________________                                        F =      Phenylalanine                                                                              H =      Histidine                                      L =      Leucine      Q =      Glutamine                                      I =      Isoleucine   N =      Asparagine                                     M =      Methionine   K =      Lysine                                         V =      Valine       D =      Aspartic Acid                                  S =      Serine       E =      Glutamic Acid                                  P =      Proline      C =      Cysteine                                       T =      Threonine    W =      Tryptophan                                     A =      Alanine      R =      Arginine                                       Y =      Tyrosine     G =      Glycine                                        ______________________________________                                    

The coding sequence for Tma DNA polymerase was identified by a"degenerate primer" method that has broad utility and is an importantaspect of the present invention. In the degenerate primer method, DNAfragments of any thermostable polymerase coding sequence correspondingto conserved domains of known thermostable DNA polymerases can beidentified.

In one embodiment of the degenerate primer method, the correspondingconserved domains are from the coding sequences for and amino acidsequences of the thermostable DNA polymerases of Taq, Tma, and Tth. Thedegenerate primer method was developed by comparing the amino acidsequences of DNA polymerase I proteins from Taq, Tth, T7, and E. coli inwhich various conserved regions were identified. Primers correspondingto these conserved regions were then designed. As a result of thepresent invention, Tma sequences can be used to design other degenerateprimers. The generic utility of the degenerate primer process isexemplified herein by specific reference to the method as applied tocloning the Tma gene.

To clone the Tma DNA polymerase gene, the conserved amino acid sequenceswere convened to all of the possible codons for each of the amino acids.Due to the degenerate nature of the genetic code, a given amino acid maybe represented by several different codons. Where more than one base canbe present in a codon for a given amino acid, the sequence is said to bedegenerate.

The primers were then synthesized as a pool of all of the possible DNAsequences that could code for a given amino acid sequence. The amount ofdegeneracy of a given primer pool can be determined by multiplying thenumber of possible nucleotides at each position.

The more degenerate a primer pool, (i.e., the greater the number ofindividual unique primer DNA sequences within the pool), the greater theprobability that one of the unique primer sequences will bind to regionsof the target chromosomal DNA other than the one desired - - - hence,the lesser the specificity of the resulting amplification. To increasethe specificity of the amplification using the degenerate primers, thepools are synthesized as subsets such that the entire group of subsetsincludes all possible DNA sequences encoding the given amino acidsequence, but each individual subset only includes a portion: forexample, one pool may contain either a G or C at a particular positionwhile the other contains either an A or T at the same position. Each ofthese subpools is designated with a DG number.

Both forward primers (directed from the 5' region toward the 3' regionof the gene, complementary to the noncoding strand) and reverse primers(directed from the 3' region toward the 5' region of the gene,complementary to the coding strand) were designed for most of theseconserved regions to clone Tma polymerase. The primers were designedwith restriction sites at the 5' ends to facilitate cloning. The forwardprimers contained a BgllI restriction site (AGATCT), while the reverseprimers contained an EcoRI restriction site (GAATTC). In addition, theprimers contained 2 nucleotides at the 5' end to increase the efficiencyof cutting at the restriction site.

Degenerate primers were then used in PCR processes in which the targetnucleic acid was chromosomal DNA from Thermotoga maritima. The productsof the PCR processes using a combination of forward and reverse primerpools in conjunction with a series of temperature profiles werecompared. When specific products of similar size to the productgenerated using Taq chromosomal DNA were produced, the PCR fragmentswere gel purified, reamplified and cloned into the vector BSM13H3:BglII(a derivative of the Stratagene vector pBSM+ in which the HindIII siteof pBSM+ was converted to a BglII site). Sequences were identified aspotential thermostable DNA polymerase coding sequences if the sequenceswere found to encode amino acid sequences homologous to other knownamino acid sequences in polymerase proteins, particularly those of Taqpolymerase and Tth polymerase.

The portions of the Tma DNA polymerase gene were then identified in thechromosomal DNA of Thermotoga maritima by Southern blot analysis. TheTma chromosomal DNA was digested with a variety of enzymes andtransferred to nitrocellulose filters. Probes labeled with ³² P orbiotin-dUTP were generated for various regions of the gene from thecloned PCR products. The probes were hybridized to thenitrocellulose-bound genomic DNA, allowing identification of the size ofthe chromosomal DNA fragment hybridizing to the probe. The use of probescovering the 5' and 3' regions of the gene ensures that the DNAfragment(s) contain most if not all of the structural gene for thepolymerase. Restriction enzymes are identified that can be used toproduce fragments that contain the structural gene in a single DNAfragment or in several DNA fragments to facilitate cloning.

Once identified, the chromosomal DNA fragments encoding the Tma DNApolymerase gene were cloned. Chromosomal DNA was digested with theidentified restriction enzyme and size fractionated. Fractionscontaining the desired size range were concentrated, desalted, andcloned into the BSM13H3:BglII cloning vector. Clones were identified byhybridization using labeled probes generated from the previous clonedPCR products. The PCR products were then analyzed on polyacrylamidegels.

The DNA sequence and amino acid sequence shown above and the DNAcompounds that encode those sequences can be used to design andconstruct recombinant DNA expression vectors to drive expression of TmaDNA polymerase activity in a wide variety of host cells. A DNA compoundencoding all or part of the DNA sequence shown above can also be used asa probe to identify thermostable polymerase-encoding DNA from otherorganisms, and the amino acid sequence shown above can be used to designpeptides for use as immunogens to prepare antibodies that can be used toidentify and purify a thermostable polymerase.

Whether produced by recombinant vectors that encode the above amino acidsequence or by native Thermotoga maritima cells, however, Tma DNApolymerase will typically be purified prior to use in a recombinant DNAtechnique. The present invention provides such purification methodology.

For recovering the native protein, the cells are grown using anysuitable technique. Briefly, the cells are grown in "MMS"-mediumcontaining (per liter): NaCl (6.93 g); MgSO₄.7H₂ O (1.75 g); MgCl₂.6H₂ O(1.38 g); KCl (0.16 g); NaBr (25 mg); H₃ BO₃ (7.5 mg); SrCl₂.6H₂ O (3.8mg); KI (0.025 mg); CaCl₂ (0.38 g); KH₂ PO₄ (0.5 g); Na₂ S (0.5 g);(NH₄)₂ Ni(SO₄)₂ (2 mg); trace minerals (Balch et al., 1979, Microbiol.Rev. 43:260-296) (15 ml); resazurin (1 mg); and starch (5 g) at a pH of6.5 (adjusted with H₂ SO₄). For growth on solid medium, 0.8% agar(Oxoid) may be added to the medium. Reasonable growth of the cells alsooccurs in "SME"-medium (Stetter et al., 1983, Syst. Appl. Microbiol.4:535-551) supplemented with 0.5% yeast extract, or in marine broth(Difco 2216).

After cell growth, the isolation and purification of the enzyme takesplace in six stages, each of which is carried out at a temperature belowroom temperature, preferably about 0° C. to about 4° C., unless statedotherwise. In the first stage or step, the cells, if frozen, are thawed,lysed in an Aminco french pressure cell (8-20,000 psi), suspended in abuffer at about pH 7.5, and sonicated to reduce viscosity.

In the second stage, ammonium sulfate is added to the lysate to preventthe Tma DNA polymerase from binding to DNA or other cell lysateproteins. Also in the second stage, Polymin P (polyethyleneimine, PEI)is added to the lysate to precipitate nucleic acids, and the lysate iscentrifuged.

In the third step, ammonium sulfate is added to the supernatant, and thesupernatant is loaded onto a phenyl sepharose column equilibrated with abuffer composed of TE (50 mM Tris-Cl, pH 7.5, and 1 mM EDTA) containing0.3M ammonium sulfate and 0.5 mM DTT (dithiothreitol). The column isthen washed first with the same buffer, second with TE-DTT (withoutammonium sulfate), third with ethylene glycol-TE-DTT, and finally with2M urea in TE-DTT containing ethylene glycol. Unless the capacity of thephenylsepharose is exceeded (i.e. by loading more than ˜20-30 mg ofprotein per ml of resin) all of the Tma polymerase activity is retainedby the column and elutes with the 2M urea in TE-DTT containing ethyleneglycol.

In the fourth stage, the urea eluate is applied to a heparin sepharosecolumn which is equilibrated with 0.08M KCl, 50 mM Tris-Cl (pH 7.5), 0.1mM EDTA, 0.2% Tween 20 and 0.5 mM DTT. The column is then washed in thesame buffer and the enzyme eluted with a linear gradient of 0.08M to0.5M KCl buffer. The peak activity fractions were found at 0.225M to0.275M KCl.

In the fifth stage, the fraction collected in the fourth stage isdiluted with affigel-blue buffer without KCl and applied to anaffigel-blue column equilibrated in 25 mM Tris-Cl (pH 7.5), 0.1 mM EDTA,0.2% Tween 20, 0.5 mM DTT, and 0.15M KCl. The column is washed with thesame buffer and eluted with a linear gradient of 0.15M to 0.7M KCl inthe same buffer. The peak activity fractions were found at the 0.3M to0.55M KCl section of the gradient. These fractions of peak activity arethen tested for contaminating deoxyribonucleases (endonucleases andexonucleases) using any suitable procedure. As an example, endonucleaseactivity may be determined electrophoretically from the change inmolecular weight of phage λ. DNA or supercoiled plasmid DNA afterincubation with an excess of DNA polymerase. Similarly, exonucleaseactivity may be determined electrophoretically from the change inmolecular weight of restriction enzyme digested DNA after incubationwith an excess of DNA polymerase. The fractions that have nodeoxyribonuclease activity are pooled and diafiltered intophosphocellulose buffer containing 50 mM KCI.

Finally, in a sixth stage, the diafiltered pool from stage five isloaded onto a phosphocellulose column equilibrated to the correct pH andionic strength of 25 mM Tris-Cl (pH 7.5), 50 mM KCl, 0.1 mM EDTA, 0.2%Tween 20, and 0.5 mM DTT. The column is then washed with the same bufferand eluted with a linear 0.05M to 0.5M KCl gradient. The peak fractionseluted between 0.215M and 0.31M KCl. An undegraded, purified DNApolymerase from these fractions is evidenced by an unchanged migrationpattern in an in situ activity gel.

The molecular weight of the DNA polymerase purified from Thermotogamaritima may be determined by any technique, for example, by SDS-PAGEanalysis using protein molecular weight markers or by calculation fromthe coding sequence. The molecular weight of the DNA polymerase purifiedfrom Thermotoga maritima is determined by SDS-PAGE to be about 97 kDa.Based on the predicted amino acid sequence, the molecular weight isestimated at about 102 kDa. The purification protocol of native Tma DNApolymerase is described in more detail in Example 1. Purification of therecombinant Tma polymerase of the invention can be carried out withsimilar methodology.

Biologically active recombinant Tma polymerases of various molecularweights can be prepared by the methods and vectors of the presentinvention. Even when the complete coding sequence of the Tma DNApolymerase gene is present in an expression vector in E. coli, the cellsproduce a truncated polymerase, formed by translation starting With themethionine codon at position 140. One can also use recombinant means toproduce a truncated polymerase corresponding to the protein produced byinitiating translation at the methionine codon at position 284 of theTma coding sequence. The polymerase lacking amino acids 1 though 139(about 86 kDa), and the polymerase lacking amino acids 1 through 283(about 70 kDa) of the wild type Tma polymerase retain polymeraseactivity but have attenuated 5'→3' exonuclease activity. In addition,the 70 kDa polymerase is significantly more thermostable than native Tmapolymerase.

Thus, the entire sequence of the intact Tma DNA polymerase enzyme is notrequired for activity. Portions of the Tma DNA polymerase codingsequence can be used in recombinant DNA techniques to produce abiologically active gene product with DNA polymerase activity. Theavailability of DNA encoding the Tma DNA polymerase sequence providesthe opportunity to modify the coding sequence so as to generate mutein(mutant protein) forms also having DNA polymerase activity. Theamino(N)-terminal portion of the Tma polymerase is not necessary forpolymerase activity but rather encodes the 5'→3' exonuclease activity ofthe protein. Using recombinant DNA methodology, one can deleteapproximately up to one-third of the N-terminal coding sequence of theTma gene, clone, and express a gene product that is quite active inpolymerase assays but, depending on the extent of the deletion, has no5'→3' exonuclease activity. Because certain N-terminal shortened formsof the polymerase are active, the gene constructs used for expression ofthese polymerases can include the corresponding shortened forms of thecoding sequence.

In addition to the N-terminal deletions, individual amino acid residuesin the peptide chain of Tma polymerase may be modified by oxidation,reduction, or other derivation, and the protein may be cleaved to obtainfragments that retain activity. Such alterations that do not destroyactivity do not remove the protein from the definition of a protein withTma polymerase activity and so are specifically included within thescope of the present invention.

Modifications to the primary structure of the Tma DNA polymerase codingsequence by deletion, addition, or alteration so as to change the aminoacids incorporated into the Tma DNA polymerase during translation of themRNA produced from that coding sequence can be made without destroyingthe high temperature DNA polymerase activity of the protein. Suchsubstitutions or other alterations result in the production of proteinshaving an amino acid sequence encoded by DNA falling within thecontemplated scope of the present invention. Likewise, the clonedgenomic sequence, or homologous synthetic sequences, of the Tma DNApolymerase gene can be used to express a fusion polypeptide with Tma DNApolymerase activity or to express a protein with an amino acid sequenceidentical to that of native Tma DNA polymerase. In addition, suchexpression can be directed by the Tma DNA polymerase gene controlsequences or by a control sequence that functions in whatever host ischosen to express the Tma DNA polymerase.

Thus, the present invention provides a coding sequence for Tma DNApolymerase from which expression vectors applicable to a variety of hostsystems can be constructed and the coding sequence expressed. Portionsof the Tma polymerase-encoding sequence are also useful as probes toretrieve other thermostable polymerase-encoding sequences in a varietyof species. Accordingly, oligonucleotide probes that encode at leastfour to six amino acids can be synthesized and used to retrieveadditional DNAs encoding a thermostable polymerase. Because there maynot be an exact match between the nucleotide sequence of thethermostable DNA polymerase gene of Thermotoga maritima and thecorresponding gene of other species, oligomers containing approximately12-18 nucleotides (encoding the four to six amino sequence) are usuallynecessary to obtain hybridization under conditions of sufficientstringency to eliminate false positives. Sequences encoding six aminoacids supply ample information for such probes. Such oligonucleotideprobes can be used as primers in the degenerate priming method of theinvention to obtain thermostable polymerase encoding sequences.

The present invention, by providing coding sequences and amino acidsequences for Tma DNA polymerase, therefore enables the isolation ofother thermostable polymerase enzymes and the coding sequences for thoseenzymes. The amino acid sequence of the Tma DNA polymerase protein isvery similar to the amino acid sequences for the thermostable DNApolymerases of Taq and Tth. These similarities facilitated theidentification and isolation of the Tma DNA polymerase coding sequence.The areas of similarity in the coding sequences of these threethermostable DNA polymerases can be readily observed by aligning thesequences.

However, regions of dissimilarity between the coding sequences of thethree thermostable DNA polymerases can also be used as probes toidentify other thermostable polymerase coding sequences that encodethermostable polymerase enzymes. For example, the coding sequence for athermostable polymerase having some properties of Taq and otherdivergent properties of Tma may be identified by using probes directedto sequences that encode the regions of dissimilarity between Taq andTma. Specifically, such regions include a stretch of four or morecontiguous amino acids from any one or more of the following regions,identified by amino acid sequence coordinates (numbering is inclusive):5-10, 73-79, 113-119, 134-145, 191-196, 328-340, 348-352, 382-387,405-414, 467-470, 495-499, 506-512, 555-559, 579-584, 595-599, 650-655,732-742, 820-825, 850-856. These regions may be considered as "hallmarkmotifs" and define additional regions of critical amino acid signaturesequences for thermostable DNA polymerase functions (e.g. 5'→3'exonuclease activity, 3'→5' exonuclease activity, and DNA polymeraseactivity).

One property found in the Tma DNA polymerase, but lacking in native TaqDNA polymerase and native Tth DNA polymerase, is 3'→5' exonucleaseactivity. This 3'→5' exonuclease activity is generally considered to bedesirable, because misincorporated or unmatched bases of the synthesizednucleic acid sequence are eliminated by this activity. Therefore, thefidelity of PCR utilizing a polymerase with 3'→5' exonuclease activity(e.g. Tma DNA polymerase) is increased. The 3'→5' exonuclease activityfound in Tma DNA polymerase also decreases the probability of theformation of primer/dimer complexes in PCR. The 3'→5' exonucleaseactivity in effect prevents any extra dNTPs from attaching to the 3' endof the primer in a nontemplate dependent fashion by removing anynucleotide that is attached in a nontemplate dependent fashion. The3'→5' exonuclease activity can eliminate single-stranded DNAs, such asprimers or single-stranded template. In essence, every 3'-nucleotide ofa single-stranded primer or template is treated by the enzyme asunmatched and is therefore degraded. To avoid primer degradation in PCR,one can add phosphorothioate to the 3' ends of the primers.Phosphorothioate modified nucleotides are more resistant to removal by3'→5' exonucleases.

A "motif" or characteristic "signature sequence" of amino acids criticalfor 3'→5' exonuclease activity in thermostable DNA polymerases can bedefined as comprising three short domains. Below, these domains areidentified as A, B, and C, with critical amino acid residues shown inone letter abbreviation and non-critical residues identified as "x."

    ______________________________________                                                               Representative                                         Domain      Sequence   Tma Coordinates                                        ______________________________________                                        A           DxExxxL    323-329                                                B           NxxxDxxxL  385-393                                                C           YxxxD      464-468                                                ______________________________________                                    

The distance between region A and region B is 55-65 amino acids. Thedistance between region B and region C is 67-75 amino acids, preferablyabout 70 amino acids. In Tma DNA polymerase, the amino acids that do notdefine the critical motif signature sequence amino acids are L and TSS,respectively, in domain A; LKF and YKV, respectively, in domain B; andSCE in domain C. Domain A is therefore DLETSSL; domain B is NLKFDYKVL;and domain C is YSCED in Tma DNA polymerase. Thus, the present inventionprovides a thermostable DNA polymerase possessing 3'→5' exonucleaseactivity that comprises domains A, B, and C, and, more particularlycomprises the sequence D-X-E-X³ -L-X⁵⁵⁻⁶⁵ -N-X³ -D-X³ -L-X⁶⁵⁻⁷⁵ -Y-X³-D, where one letter amino acid abbreviation is used, and X^(N)represents the number (N) of non-critical amino acids between thespecified amino acids.

A thermostable 3'→5' exonuclease domain is represented by amino acids291 through 484 of Tma DNA polymerase. Accordingly, "domain shuffling"or construction of "thermostable chimeric DNA polymerases" may be usedto provide thermostable DNA polymerases containing novel properties. Forexample, substitution of the Tma DNA polymerase coding sequencecomprising codons about 291 through about 484 for the Thermus aquaticusDNA polymerase codons 289-422 would yield a novel thermostable DNApolymerase containing the 5'→3' exonuclease domain of Taq DNA polymerase(1-289), the 3'→5' exonuclease domain of Tma DNA polymerase (291-484),and the DNA polymerase domain of Taq DNA polymerase (423-832).Alternatively, the 5'→3' exonuclease domain and the 3'→5' exonucleasedomain of Tma DNA polymerase (ca. codons 1-484) may be fused to the DNApolymerase (dNTP binding and primer/template binding domains) portionsof Taq DNA polymerase (ca. codons 423-832). The donors and recipientsneed not be limited to Taq and Tma DNA polymerases. Tth DNA polymeraseprovides analogous domains as Taq DNA polymerase. In addition, theenhanced/preferred reverse transcriptase properties of Tth DNApolymerase can be further enhanced by the addition of a 3'→5'exonuclease domain as illustrated above.

While any of a variety of means may be used to generate chimeric DNApolymerase coding sequences (possessing novel properties), a preferredmethod employs "overlap" PCR. In this method, the intended junctionsequence is designed into the PCR primers (at their 5'-ends). Followingthe initial amplification of the individual domains, the variousproducts are diluted (ca. 100 to 1000-fold) and combined, denatured,annealed, extended, and then the final forward and reverse primers areadded for an otherwise standard PCR.

Thus, the sequence that codes for the 3'→5' exonuclease activity of TmaDNA polymerase can be removed from Tma DNA polymerase or added to otherpolymerases that lack this activity by recombinant DNA methodology. Onecan even replace, in a non-thermostable DNA polymerase, the 3'→5'exonuclease activity domain with the thermostable 3'→5' exonucleasedomain of Tma polymerase. Likewise, the 3'→5' exonuclease activitydomain of a non-thermostable DNA polymerase can be used to replace the3'→5' exonuclease domain of Tma polymerase (or any other thermostablepolymerase) to create a useful polymerase of the invention. Those ofskill in the art recognize that the above chimeric polymerases are mosteasily constructed by recombinant DNA techniques. Similar chimericpolymerases can be constructed by moving the 5'→3' exonuclease domain ofone DNA polymerase to another.

Whether one desires to produce an enzyme identical to native Tma DNApolymerase or a derivative or homologue of that enzyme, the productionof a recombinant form of Tma polymerase typically involves theconstruction of an expression vector, the transformation of a host cellwith the vector, and culture of the . transformed host cell underconditions such that expression will occur.

To construct the expression vector, a DNA is obtained that encodes themature (used here to include all chimeras or muteins) enzyme or a fusionof the Tma polymerase to an additional sequence that does not destroyactivity or to an additional sequence cleavable under controlledconditions (such as treatment with peptidase) to give an active protein.The coding sequence is then placed in operable linkage with suitablecontrol sequences in an expression vector. The vector can be designed toreplicate autonomously in the host cell or to integrate into thechromosomal DNA of the host cell. The vector is used to transform asuitable host, and the transformed host is cultured under conditionssuitable for expression of recombinant Tma polymerase. The Tmapolymerase is isolated from the medium or from the cells, althoughrecovery and purification of the protein may not be necessary in someinstances.

Each of the foregoing steps can be done in a variety of ways. Forexample, the desired coding sequence may be obtained from genomicfragments and used directly in appropriate hosts. The construction forexpression vectors operable in a variety of hosts is made usingappropriate replicons and control sequences, as set forth generallybelow. Construction of suitable vectors containing the desired codingand control sequences employs standard ligation and restrictiontechniques that are well understood in the art. Isolated plasmids, DNAsequences, or synthesized oligonucleotides are cleaved, modified, andreligated in the form desired. Suitable restriction sites can, if notnormally available, be added to the ends of the coding sequence so as tofacilitate construction of an expression vector, as exemplified below.

Site-specific DNA cleavage is performed by treating with suitablerestriction enzyme (or enzymes) under conditions that are generallyunderstood in the art and specified by the manufacturers of commerciallyavailable restriction enzymes. See, e.g., New England Biolabs, ProductCatalog. In general, about 1 μg of plasmid or other DNA is cleaved byone unit of enzyme in about 20 μl of buffer solution; in the examplesbelow, an excess of restriction enzyme is generally used to ensurecomplete digestion of the DNA. Incubation times of about one to twohours at about 37° C. are typical, although variations can be tolerated.After each incubation, protein is removed by extraction with phenol andchloroform; this extraction can be followed by ether extraction andrecovery of the DNA from aqueous fractions by precipitation withethanol. If desired, size separation of the cleaved fragments may beperformed by polyacrylamide gel or agarose gel electrophoresis usingstandard techniques. See, e.g., Methods in Enzymology, 1980, 65:499-560.

Restriction-cleaved fragments with single-strand "overhanging" terminican be made blunt-ended (double-strand ends) by treating with the largefragment of E. coli DNA polymerase I (Klenow) in the presence of thefour deoxynucleoside triphosphates (dNTPs) using incubation times ofabout 15 to 25 minutes at 20° C. to 25° C. in 50 mM Tris pH 7.6, 50 mMNaCl, 10 mM MgCl₂, 10 mM DTT, and 5 to 10 μM dNTPs. The Klenow fragmentfills in at 5' protruding ends, but chews back protruding 3' singlestrands, even though the four dNTPs are present. If desired, selectiverepair can be performed by supplying only one of the, or selected, dNTPswithin the limitations dictated by the nature of the protruding ends.After treatment with Klenow, the mixture is extracted withphenol/chloroform and ethanol precipitated. Similar results can beachieved using S1 nuclease, because treatment under appropriateconditions with S1 nuclease results in hydrolysis of any single-strandedportion of a nucleic acid.

Synthetic oligonucleotides can be prepared using the triester method ofMatteucci et al., 1981, J. Am. Chem. Soc. 103:3185-3191, or automatedsynthesis methods. Kinasing of single strands prior .to annealing or forlabeling is achieved using an excess, e.g., approximately 10 units, ofpolynucleotide kinase to 0.5 μM substrate in the presence of 50 mM Tris,pH 7.6, 10 mM MgCl₂, 5 mM dithiothreitol (DTT), and 1 to 2 μM ATP. Ifkinasing is for labeling of probe, the ATP will contain high specificactivity γ-³² P.

Ligations are performed in 15-30 μl volumes under the following standardconditions and temperatures: 20 mM Tris-Cl, pH 7.5, 10 mM MgCl₂, 10 mMDTT, 33 μg/ml BSA, 10 mM-50 mM NaCl, and either 40 μM ATP and 0.01-0.02(Weiss) units T4 DNA ligase at 0° C. (for ligation of fragments withcomplementary single-stranded ends) or 1 mM ATP and 0.3-0.6 units T4 DNAligase at 14° C. (for "blunt end" ligation). Intermolecular ligations offragments with complementary ends are usually performed at 33-100 μg/mltotal DNA concentrations (5 to 100 nM total ends concentration).Intermolecular blunt end ligations (usually employing a 20 to 30 foldmolar excess of linkers, optionally) are performed at 1 μM total endsconcentration.

In vector construction, the vector fragment is commonly treated withbacterial or calf intestinal alkaline phosphatase (BAP or CIAP) toremove the 5' phosphate and prevent religation and reconstruction of thevector. BAP and ClAP digestion conditions are well known in the art, andpublished protocols usually accompany the commercially available BAP andClAP enzymes: To recover the nucleic acid fragments, the preparation isextracted with phenol-chloroform and ethanol precipitated to remove APand purify the DNA. Alternatively, religation can be prevented byrestriction enzyme digestion of unwanted vector fragments before orafter ligation with the desired vector.

For portions of vectors or coding sequences that require sequencemodifications, a variety of site-specific primer-directed mutagenesismethods are available. The polymerase chain reaction (PCR) can be usedto perform site-specific mutagenesis. In another technique now standardin the art, a synthetic oligonucleotide encoding the desired mutation isused as a primer to direct synthesis of a complementary nucleic acidsequence of a single-stranded vector, such as pBS13+, that serves as atemplate. for construction of the extension product of the mutagenizingprimer. The mutagenized DNA is transformed into a host bacterium, andcultures of the transformed bacteria are plated and identified. Theidentification of modified vectors may involve transfer of the DNA ofselected transformants to a nitrocellulose filter or other membrane andthe "lifts" hybridized with kinased synthetic primer at a temperaturethat permits hybridization of an exact match to the modified sequencebut prevents hybridization with the original strand. Transformants thatcontain DNA that hybridizes with the probe are then cultured and serveas a reservoir of the modified DNA.

In the constructions set forth below, correct ligations for plasmidconstruction are confirmed by first transforming E. coli strain DG 101or another suitable host with the ligation mixture. Successfultransformants are selected by ampicillin, tetracycline or otherantibiotic resistance or sensitivity or by using other markers,depending on the mode of plasmid construction, as is understood in theart. Plasmids from the transformants are then prepared according to themethod of Clewell et al., 1969, Proc. Natl. Acad. Sci. USA 62:1159,optionally following chloramphenicol amplification (Clewell, 1972, J.Bacteriol. 110:667). Another method for obtaining plasmid DNA isdescribed as the "Base-Acid" extraction method at page 11 of theBethesda Research Laboratories publication Focus, volume 5, number 2,and very pure plasmid DNA can be obtained by replacing steps 12 through17 of the protocol with CsCl/ethidium bromide ultracentrifugation of theDNA. The isolated DNA is analyzed by restriction enzyme digestion and/orsequenced by the dideoxy method of Sanger et al., 1977, Proc. Natl.Acad. Sci. USA 74:5463, as further described by Messing et al., 1981,Nuc. Acids Res. 9:309, or by the method of Maxam et al., 1980, Methodsin Enzymology 65:499.

The control sequences, expression vectors, and transformation methodsare dependent on the type of host cell used to express the gene.Generally, procaryotic, yeast, insect, or mammalian cells are used ashosts. Procaryotic hosts are in general the most efficient andconvenient for the production of recombinant proteins and are thereforepreferred for the expression of Tma polymerase.

The procaryote most frequently used to express recombinant proteins isE. coli. For cloning and sequencing, and for expression of constructionsunder control of most bacterial promoters, E. coli K12 strain MM294,obtained from the E. coli Genetic Stock Center under GCSC #6135, can beused as the host. For expression vectors with the P_(L) N_(RBS) controlsequence, E. Coli K12 strain MC1000 lambda lysogen, N₇ N₅₃ cI₈₅₇ SusP₈₀,ATCC 39531, may be used. E. coli DG116, which was deposited with theATCC (ATCC 53606) on Apr. 7, 1987, and E. coli KB2, which was depositedwith the ATCC (ATCC 53075) on Mar. 29, 1985, are also useful host cells.For M13 phage recombinants, E. coli strains susceptible to phageinfection, such as E. coli K12 strain DG98, are employed. The DG98strain was deposited with the ATCC (ATCC 39768 ) on Jul. 13, 1984.

However, microbial strains other than E. coli can also be used, such asbacilli, for example Bacillus subtilis, various species of Pseudomonas,and other bacterial strains, for recombinant expression of Tma DNApolymerase. In such procaryotic systems, plasmid vectors that containreplication sites and control sequences derived from the host or aspecies compatible with the host are typically used.

For example, E. coli is typically transformed using derivatives ofpBR322, described by Bolivar et al., 1977, Gene 2:95. Plasmid pBR322contains genes for ampicillin and tetracycline resistance. These drugresistance markers can be either retained or destroyed in constructingthe desired vector and so help to detect the presence of a desiredrecombinant. Commonly used procaryotic control sequences, i.e., apromoter for transcription initiation, optionally with an operator,along with a ribosome binding site sequence, include the β-lactamase(penicillinase) and lactose (lac) promoter systems (Chang et al., 1977,Nature 198:1056), the tryptophan (trp) promoter system (Goeddel et al.,1980, NUC. Acids Res. 8:4057), and the lambda-derived P_(L) promoter(Shimatake et al., 1981, Nature 292: 128) and N-gene ribosome bindingsite (N_(RBS)). A portable control system cassette is set forth in U.S.Pat. No. 4,711,845, issued Dec. 8, 1987. This cassette comprises a P_(L)promoter operably linked to the N_(RBS) in turn positioned upstream of athird DNA sequence having at least one restriction site that permitscleavage within six bp 3' of the N_(RBS) sequence. Also useful is thephosphatase A (phoA) system described by Chang et al. in European PatentPublication No. 196,864, published Oct. 8, 1986. However, any availablepromoter system compatible with procaryotes can be used to construct aTma expression vector of the invention.

The nucleotide sequence of the Tma insert may negatively affect theefficiency of the upstream ribosomal binding site, resulting in lowlevels of translated polymerase. The translation of the Tma gene can beenhanced by the construction of "translationally coupled" derivatives ofthe expression vectors. An expression vector can be constructed with asecondary translation initiation signal and short coding sequence justupstream of the Tma gene coding sequence such that the stop codon forthe short coding sequence is "coupled" with the ATG start codon for theTma gene coding sequence. A secondary translation initiation signal thatefficiently initiates translation can be inserted upstream of the Tmagene start codon. Translation of the short coding sequence brings theribosome into close proximity with the Tma gene translation initiationsite, thereby enhancing translation of the Tma gene. For example, oneexpression system can utilize the translation initiation signal andfirst ten codons of the T7 bacteriophage major capsid protein (gene 10)fused in-frame to the last six codons of TrpE. The TGA (stop) codon forTrpE is "coupled" with the ATG (start) codon for the Tma gene codingsequence, forming the sequence TGATG. A one base frame-shift is requiredbetween translation of the short coding sequence and translation of theTma coding sequence. These derivative expression vectors can beconstructed by recombinant DNA methods.

The redundancy of the genetic code can also be related to a lowtranslation efficiency. Typically, when multiple codons coding the sameamino acid occur, one of the possible codons is preferentially used inan organism. Frequently, an organism accumulates the tRNA speciescorresponding to the preferred codons at a higher level than thosecorresponding to rarely used codons. If the pattern of codon usagediffers between Thermotoga maritima and the host cell, the tRNA speciesnecessary for translation of the Tma polymerase gene may be in lowabundance. In the Tma coding sequence, arginine is most frequently codedfor by the "AGA" codon, whereas this codon is used at low frequency inE. coli genes, and the corresponding tRNA is present in lowconcentration in E. coli host cells. Consequently, the low concentrationin the E. coli host cell of "Arg U" tRNA for the "AGA" condon may limitthe translation efficiency of the Tma polymerase gene RNA in E. colihost cells. The efficiency of translation of the Tma coding sequencewithin an E. coli host cell may be improved by increasing theconcentration of this Arg tRNA species by expressing multiple copies ofthis tRNA gene in the host cell.

In addition to bacteria, eucaryotic microbes, such as yeast, can also beused as recombinant host cells. Laboratory strains of Saccharomycescerevisiae, Baker's yeast, are most often used, although a number ofother strains are commonly available. While vectors employing the twomicron origin of replication are common (Broach, 1983, Meth. Enz,101:307), other plasmid vectors suitable for yeast expression are known(see, for example, Stinchcomb et al., 1979, Nature 282:39; Tschempe etal., 1980, Gene 10:157; and Clarke etal., 1983, Meth. Enz. 101:300).Control sequences for yeast vectors include promoters for the synthesisof glycolytic enzymes (Hess et al., 1968, J. Adv. Enzyme Reg. 7:149;Holland etal., 1978, Biotechnology 17:4900; and Holland et al., 1981, J.Biol. Chem. 256:1385). Additional promoters known in the art include thepromoter for 3-phosphoglycerate kinase (Hitzeman et al., 1980, J. Biol.Chem. 255:2073) and those for other glycolytic enzymes, such asglyceraldehyde 3-phosphate dehydrogenase, hexokinase, pyruvatedecarboxylase, phosphofructokinase, glucose-6-phosphate isomerase,3-phosphoglycerate mutase, pyruvate kinase, triosephosphate isomerase,phosphoglucose isomerase, and glucokinase. Other promoters that have theadditional advantage of transcription controlled by growth conditionsare the promoter regions for alcohol dehydrogenase 2, isocytochrome C,acid phosphatase, degradative enzymes associated with nitrogenmetabolism, and enzymes responsible for maltose and galactoseutilization (Holland, supra).

Terminator sequences may also be used to enhance expression when placedat the 3' end of the coding sequence. Such terminators are found in the3' untranslated region following the coding sequences in yeast-derivedgenes. Any vector containing a yeast-compatible promoter, origin ofreplication, and other control sequences is suitable for use inconstructing yeast Tma expression vectors.

The Tma gene can also be expressed in eucaryotic host cell culturesderived from multicellular organisms. See, for example, Tissue Culture,Academic Press, Cruz and Patterson, editors (1973). Useful host celllines include COS-7, COS-A2, CV-1, murine cells such as murine myelomasN51 and VERO, HeLa cells, and Chinese hamster ovary (CHO) cells.Expression vectors for such cells ordinarily include promoters andcontrol sequences compatible with mammalian cells such as, for example,the commonly used early and late promoters from Simian Virus 40 (SV 40)(Fiers et al., 1978, Nature 273:113), or other viral promoters such asthose derived from polyoma, adenovirus 2, bovine papilloma virus (BPV),or avian sarcoma viruses, or immunoglobulin promoters and heat shockpromoters. A system for expressing DNA in mammalian systems using a BPVvector system is disclosed in U.S. Pat. No. 4,419,446. A modification ofthis system is described in U.S. Pat. No. 4,601,978. General aspects ofmammalian cell host system transformations have been described by Axel,U.S. Pat. No. 4,399,216. "Enhancer" regions are also important inoptimizing expression; these are, generally, sequences found upstream ofthe promoter region. Origins of replication may be obtained, if needed,from viral sources. However, integration into the chromosome is a commonmechanism for DNA replication in eucaryotes.

Plant cells can also be used as hosts, and control sequences compatiblewith plant cells, such as the nopaline synthase promoter andpolyadenylation signal sequences (Depicker etal., 1982, J. Mol. Appl.Gen. 1:561) are available. Expression systems employing insect cellsutilizing the control systems provided by baculovirus vectors have alsobeen described (Miller et al., 1986, Genetic Engineering (Setlow et al.,eds., Plenum Publishing) 8:277-297). Insect cell-based expression can beaccomplished in Spodoptera frugipeida. These systems can also be used toproduce recombinant Tma polymerase.

Depending on the host cell used, transformation .is done using standardtechniques appropriate to such cells. The calcium treatment employingcalcium chloride, as described by Cohen, 1972, Proc. Natl. Acad. Sci.USA 69:2110 is used for procaryotes or other cells that containsubstantial cell wall barriers. Infection with Agrobacterium tumefaciens(Shaw et al., 1983, Gene 23:315) is used for certain plant cells. Formammalian cells, the calcium phosphate precipitation method of Grahamand van der Eb, 1978, Virology 52:546 is preferred. Transformations intoyeast are carded out according to the method of Van Solingen et al.,1977, J. Bact. 130:946 and Hsiao et al., 1979, Proc. Natl. Acad. Sci.USA 76:3829.

Once the Tma DNA polymerase has been expressed in a recombinant hostcell, purification of the protein may be desired. Although a variety ofpurification procedures can be used to purify the recombinantthermostable polymerase of the invention, fewer steps may be necessaryto yield an enzyme preparation of equal purity. Because E. coli hostproteins are heat-sensitive, the recombinant thermostable Tma DNApolymerase can be substantially enriched by heat inactivating the crudelysate. This step is done in the presence of a sufficient amount of salt(typically 0.3M ammonium sulfate) to ensure dissociation of the Tma DNApolymerase from the host DNA and to reduce ionic interactions of Tma DNApolymerase with other cell lysate proteins. In addition, the presence of0.3M ammonium sulfate promotes hydrophobic interaction with a phenylsepharose column. Hydrophobic interaction chromatography is a separationtechnique in which substances are separated on the basis of differingstrengths of hydrophobie interaction with an uncharged bed materialcontaining hydrophobic groups. Typically, the column is firstequilibrated under conditions favorable to hydrophobie binding, such ashigh ionic strength. A descending salt gradient may then be used toelute the sample.

According to the invention, an aqueous mixture (containing either nativeor recombinant Tma, DNA polymerase) is loaded onto a column containing arelatively strong hydrophobic gel such as phenyl sepharose (manufacturedby Pharmacia) or Phenyl TSK (manufactured by Toyo Soda). To promotehydrophobic interaction with a phenyl sepharose column, a solvent isused that contains, for example, greater than or equal to 0.3M ammoniumsulfate, with 0.3M being preferred, or greater than or equal to 0.5MNaCl. The column and the sample are adjusted to 0.3M ammonium sulfate in50 mM Tfis (pH 7.5) and 1.0 mM EDTA ("TE") buffer that also contains 0.5mM DTT, and the sample is applied to the column. The column is washedwith the 0.3M ammonium sulfate buffer. The enzyme may then be elutedwith solvents that attenuate hydrophobic interactions, such asdecreasing salt gradients, ethylene or propylene glycol, or urea. Fornative Tma DNA polymerase, a preferred embodiment involves washing thecolumn with 2 M urea and 20% ethylene glycol in TE-DTT.

For long-term stability, Tma DNA polymerase enzyme can be stored in abuffer that contains one or more non-ionic polymeric detergents. Suchdetergents are generally those that have a molecular weight in the rangeof approximately 100 to 250,000 daltons, preferably about 4,000 to200,000 daltons, and stabilize the enzyme at a pH of from about 3.5 toabout 9.5, preferably from about 4 to 8.5. Examples of such detergentsinclude those specified on pages 295-298 of McCutcheon's Emulsifiers &Detergents, North American edition (1983), published by the McCutcheonDivision of MC Publishing Co., 175 Rock Road, Glen Rock, N.J. (USA) andcopending Ser. No. 387,003, filed Jul. 28, 1989, each of which isincorporated herein by reference.

Preferably, the detergents are selected from the group comprisingethoxylated fatty alcohol ethers and lauryl ethers, ethoxylated alkylphenols, octylphenoxy polyethoxy ethanol compounds, modifiedoxyethylated and/or oxypropylated straight-chain alcohols, polyethyleneglycol monooleate compounds, polysorbate compounds, and phenolic fattyalcohol ethers. More particularly preferred are Tween 20, apolyoxyethylated (20) sorbitan monolaurate from ICI Americas Inc.,Wilmington, Del., and Iconol NP-40, an ethoxylated alkyl phenol (nonyl)from BASF Wyandotte Corp. Parsippany, N.J.

The thermostable enzyme of this invention may be used for any purpose inwhich such enzyme activity is necessary or desired. In a particularlypreferred embodiment, the enzyme catalyzes the nucleic acidamplification reaction known as PCR. This process for amplifying nucleicacid sequences is disclosed and claimed in U.S. Pat. Nos. 4,683,202 and4,865,188, each of which is incorporated herein by reference. The PCRnucleic acid amplification method involves amplifying at least onespecific nucleic acid sequence contained in a nucleic acid or a mixtureof nucleic acids and in the most common embodiment, producesdouble-stranded DNA.

For ease of discussion, the protocol set forth below assumes that thespecific sequence to be amplified is contained in a double-strandednucleic acid. However, the process is equally useful in amplifyingsingle-stranded nucleic acid, such as mRNA, although in the preferredembodiment the ultimate product is still double-stranded DNA. In theamplification of a single-stranded nucleic acid, the first step involvesthe synthesis of a complementary strand (one of the two amplificationprimers can be used for this purpose), and the succeeding steps proceedas in the double-stranded amplification process described below.

This amplification process comprises the steps of:

(a) contacting each nucleic acid strand with four different nucleosidetriphosphates and two oligonucleotide primers for each specific sequencebeing amplified, wherein each primer is selected to be substantiallycomplementary to the different strands of the specific sequence, suchthat the extension product synthesized from one primer, when separatedfrom its complement, can serve as a template for synthesis of theextension product of the other primer, said contacting being at atemperature that allows hybridization of each primer to a complementarynucleic acid strand;

(b) contacting each nucleic acid strand, at the same time as or afterstep (a), with a DNA polymerase from Thermotoga maritima that enablescombination of the nucleoside triphosphates to form primer extensionproducts complementary to each strand of the specific nucleic acidsequence;

(c) maintaining the mixture from step (b) at an effective temperaturefor an effective time to promote the activity of the enzyme and tosynthesize, for each different sequence being amplified, an extensionproduct of each primer that is complementary to each nucleic acid strandtemplate, but not so high as to separate each extension product from thecomplementary strand template;

(d) heating the mixture from step (c) for an effective time and at aneffective temperature to separate the primer extension products from thetemplates on which they were synthesized to produce single-strandedmolecules but not so high as to denature irreversibly the enzyme;

(e) cooling the mixture from step (d) for an effective time and to aneffective temperature to promote hybridization of a primer to each ofthe single-stranded molecules produced in step (d); and

(f) maintaining the mixture from step (e) at an effective temperaturefor an effective time to promote the activity of the enzyme and tosynthesize, for each different sequence being amplified, an extensionproduct of each primer that is complementary to each nucleic acidtemplate produced in step (d) but not so high as to separate eachextension product from the complementary strand template. The effectivetimes and temperatures in steps (e) and (f) may coincide, so that steps(e) and (f) can be carried out simultaneously. Steps (d)-(f) arerepeated until the desired level of amplification is obtained.

The amplification method is useful not only for producing large amountsof a specific nucleic acid sequence of known sequence but also forproducing nucleic acid sequences .that are known to exist but are notcompletely specified. One need know only a sufficient number of bases atboth ends of the sequence in sufficient detail so that twooligonucleotide primers can be prepared that will hybridize to differentstrands of the desired sequence at relative positions along the sequencesuch that an extension product synthesized from one primer, whenseparated from the template (complement), can serve as a template forextension of the other primer into a nucleic acid sequence of definedlength. The greater the knowledge about the bases at both ends of thesequence, the greater can be the specificity of the primers for thetarget nucleic acid sequence and the efficiency of the process.

In any case, an initial copy of the sequence to be amplified must beavailable, although the sequence need not be pure or a discretemolecule. In general, the amplification process involves a chainreaction for producing, in exponential quantities relative to the numberof reaction steps involved, at least one specific nucleic acid sequencegiven that (a) the ends of the required sequence are known in sufficientdetail that oligonucleotides can be synthesized that will hybridize tothem and (b) that a small amount of the sequence is available toinitiate the chain reaction. The product of the chain reaction will be adiscrete nucleic acid duplex with termini corresponding to the 5' endsof the specific primers employed.

Any nucleic acid sequence, in purified or nonpurified form, can beutilized as the starting nucleic acid(s), provided it contains or issuspected to contain the specific nucleic acid sequence one desires toamplify. The nucleic acid to be amplified can be obtained from anysource, for example, from plasmids such as pBR322, from cloned DNA orRNA, or from natural DNA or RNA from any source, including bacteria,yeast, viruses, organelles, and higher organisms such as plants andanimals. DNA or RNA may be extracted from blood, tissue material such aschorionic villi, or amniotic cells by a variety of techniques. See,e.g., Maniatis et al., supra, pp. 280-281. Thus, the process may employ,for example, DNA or RNA, including messenger RNA, which DNA or RNA maybe single-stranded or double-stranded. In addition, a DNA-RNA hybridthat contains one strand of each may be utilized. A mixture of any ofthese nucleic acids can also be employed as can nucleic acids producedfrom a previous amplification reaction (using the same or differentprimers). The specific nucleic acid sequence to be amplified can be onlya fraction of a large molecule or can be present initially as a discretemolecule, so that the specific sequence constitutes the entire nucleicacid.

The sequence to be amplified need not be present initially in a pureform; the sequence can be a minor fraction of a complex mixture, such asa portion of the β-globin gene contained in whole human DNA (asexemplified in Saiki et al., 1985, Science 230:1530-1534) or a portionof a nucleic acid sequence due to a particular microorganism, whichorganism might constitute only a very minor fraction of a particularbiological sample. The cells can be directly used in the amplificationprocess after suspension in hypotonic buffer and heat treatment at about90°-100° C. until cell lysis and dispersion of intracellular componentsoccur (generally 1 to 15 minutes). After the heating step, theamplification reagents may be added directly to the lysed cells. Thestarting nucleic acid sequence can contain more than one desiredspecific nucleic acid sequence. The amplification process is useful notonly for producing large amounts of one specific nucleic acid sequencebut also for amplifying simultaneously more than one different specificnucleic acid sequence located on the same or different nucleic acidmolecules.

Primers play a key role in the PCR process. The word "primer" as used indescribing the amplification process can refer to more than one primer,particularly in the case where there is some ambiguity in theinformation regarding the terminal sequence(s) of the fragment to beamplified or where one employs the degenerate primer process of theinvention. For instance, in the case where a nucleic acid sequence isinferred from protein sequence information, a collection of primerscontaining sequences representing all possible codon variations based ondegeneracy of the genetic code will be used for each strand. One primerfrom this collection will be sufficiently homologous with the end of thedesired sequence to be amplified to be useful for amplification.

In addition, more than one specific nucleic acid sequence can beamplified from the first nucleic acid or mixture of nucleic acids, solong as the appropriate number of different oligonucleotide primers areutilized. For example, if two different specific nucleic acid sequencesare to be produced, four primers are utilized. Two of the primers arespecific for one of the specific nucleic acid sequences, and the othertwo primers are specific for the second specific nucleic acid sequence.In this manner, each of the two different specific sequences can beproduced exponentially by the present process.

A sequence within a given sequence can be amplified after a given numberof amplification cycles to obtain greater specificity in the reaction byadding, after at least one cycle of amplification, a set of primers thatare complementary to internal sequences (i.e., sequences that are not onthe ends) of the sequence to be amplified. Such primers can be added atany stage and will provide a shorter amplified fragment. Alternatively,a longer fragment can be prepared by using primers withnon-complementary ends but having some overlap with the primerspreviously utilized in the amplification.

Primers also play a key role when the amplification process is used forin vitro mutagenesis. The product of an amplification reaction where theprimers employed are not exactly complementary to the original templatewill contain the sequence of the primer rather than the template, sointroducing an in vitro mutation. In further cycles, this mutation willbe amplified with an undiminished efficiency because no furthermispaired priming is required. The process of making an altered DNAsequence as described above could be repeated on the altered DNA usingdifferent primers to induce further sequence changes. In this way, aseries of mutated sequences can gradually be produced wherein each newaddition to the series differs from the last in a minor way, but fromthe original DNA source sequence in an increasingly major way.

Because the primer can contain as part of its sequence anon-complementary sequence, provided that a sufficient amount of theprimer contains a sequence that is complementary to the strand to beamplified, many other advantages can be realized. For example, anucleotide sequence that is not complementary to the template sequence(such as, e.g., a promoter, linker, coding sequence, etc.) may beattached at the 5' end of one or both of the primers and so appended tothe product of the amplification process. After the extension primer isadded, sufficient cycles are run to achieve the desired amount of newtemplate containing the non-complementary nucleotide insert. This allowsproduction of large quantities of the combined fragments in a relativelyshort period of time (e.g., two hours or less) using a simple technique.

Oligonucleotide primers can be prepared using any suitable method, suchas, for example, the phosphotriester and phosphodiester methodsdescribed above, or automated embodiments thereof. In one such automatedembodiment, diethylphosphoramidites are used as starting materials andcan be synthesized as described by Beaucage et al., 1981, TetrahedronLetters 22:1859-1862. One method for synthesizing oligonucleotides on amodified solid support is described in U.S. Pat. No. 4,458,066. One canalso use a primer that has been isolated from a biological source (suchas a restriction endonuclease digest).

No matter what primers are used, however, the reaction mixture mustcontain a template for PCR to occur, because the specific nucleic acidsequence is produced by using a nucleic acid containing that sequence asa template. The first step involves contacting each nucleic acid strandwith four different nucleoside triphosphates and two oligonucleotideprimers for each specific nucleic acid sequence being amplified ordetected. If the nucleic acids to be amplified or detected are DNA, thenthe nucleoside triphosphates are usually dATP, dCTP, dGTP, and dTTP,although various nucleotide derivatives can also be used in the process.The concentration of nucleoside triphosphates can vary widely.Typically, the concentration is 50 to 200 μM in each dNTP in the bufferfor amplification, and MgCl₂ is present in the buffer in an amount of 1to 3 mM to activate the polymerase and increase the specificity of thereaction. However, dNTP concentrations of 1 to 20 μM may be preferredfor some applications, such as DNA sequencing or generating radiolabeledprobes at high specific activity.

The nucleic acid strands of the target nucleic acid serve as templatesfor the synthesis of additional nucleic acid strands, which areextension products of the primers. This synthesis can be performed usingany suitable method, but generally occurs in a buffered aqueoussolution, preferably at a pH of 7 to 9, most preferably about 8. Tofacilitate synthesis, a molar excess of the two oligonucleotide primersis added to the buffer containing the template strands. As a practicalmatter, the amount of primer added will generally be in molar excessover the amount of complementary strand (template) when the sequence tobe amplified is contained in a mixture of complicated long-chain nucleicacid strands. A large molar excess is preferred to improve theefficiency of the process. Accordingly, primer:template ratios of atleast 1000:1 or higher are generally employed for cloned DNA templates,and primer: template ratios of about 108:1 or higher are generallyemployed for amplification from complex genomic samples.

The mixture of template, primers, and nucleoside triphosphates is thentreated according to whether the nucleic acids being amplified ordetected are double- or single-stranded. If the nucleic acids aresingle-stranded, then no denaturation step need be employed prior to thefirst extension cycle, and the reaction mixture is held at a temperaturethat promotes hybridization of the primer to its complementary target(template) sequence. Such temperature is generally from about 35° C. to65° C. or more, preferably about 37° C. to 60° C. for an effective time,generally from a few seconds to five minutes, preferably from 30 secondsto one minute. A hybridization temperature of 35° C. to 70° C. may beused for Tma DNA polymerase. Primers that are 15 nucleotides or longerin length are used to increase the specificity of primer hybridization.Shorter primers require lower hybridization temperatures.

The complement to the original single-stranded nucleic acids can besynthesized by adding Tma DNA polymerase in the presence of theappropriate buffer, dNTPs, and one or more oligonucleotide primers. Ifan appropriate single primer is added, the primer extension product willbe complementary to the single-stranded nucleic acid and will behybridized with the nucleic acid strand in a duplex of strands of equalor unequal length (depending on where the primer hybridizes to thetemplate), which may then be separated into single strands as describedabove to produce two single, separated, complementary strands. A secondprimer would then be added so that subsequent cycles of primer extensionwould occur using both the original single-stranded nucleic acid and theextension product of the first primer as templates. Alternatively, twoor more appropriate primers (one of which will prime synthesis using theextension product of the other primer as a template) can be added to thesingle-stranded nucleic acid and the reaction carried out.

If the nucleic acid contains two strands, as in the case ofamplification of a double-stranded target or second-cycle amplificationof a single-stranded target, the strands of nucleic acid must beseparated before the primers are hybridized. This strand separation canbe accomplished by any suitable denaturing method, including physical,chemical or enzymatic means. One preferred physical method of separatingthe strands of the nucleic acid involves heating the nucleic acid untilcomplete (>99%) denaturization occurs. Typical heat denaturationinvolves temperatures ranging from about 80° C. to 105° C. for timesgenerally ranging from about a few seconds to minutes, depending on thecomposition and size of the nucleic acid. Preferably, the effectivedenaturing temperature is 90°-100° C. for a few seconds to 1 minute.Strand separation may also be induced by an enzyme from the class ofenzymes known as helicases or the enzyme RecA, which has helicaseactivity and in the presence of fiboATP is known to denature DNA. Thereaction conditions suitable for separating the strands of nucleic acidswith helicases are described by Kuhn Hoffmann-Berling, 1978, CSH-Ouantitative Biology 43:63, and techniques for using RecA are reviewedin Radding, 1982, Ann. Rev. Genetics 16:405-437. The denaturationproduces two separated complementary strands of equal or unequal length.

If the double-stranded nucleic acid is denatured by heat, the reactionmixture is allowed to cool to a temperature that promotes hybridizationof each primer to the complementary target (template) sequence. Thistemperature is usually from about 35° C. to 65° C. or more, depending onreagents, preferably 37° C. to 60° C. The hybridization temperature ismaintained for an effective time, generally a few seconds to minutes,and preferably 10 seconds to 1 minute. In practical terms, thetemperature is simply lowered from about 95° C. to as low as 37° C., andhybridization occurs at a temperature within this range.

Whether the nucleic acid is single- or double-stranded, the DNApolymerase from Thermotoga maritima can be added prior to or during thedenaturation step or when the temperature is being reduced to or is inthe range for promoting hybridization. Although the thermostability ofTma polymerase allows one to add Tma polymerase to the reaction mixtureat any time, one can substantially inhibit non-specific amplification byadding the polymerase to the reaction mixture at a point in time whenthe mixture will not be cooled below the stringent hybridizationtemperature. After hybridization, the reaction mixture is then heated toor maintained at a temperature at which the activity of the enzyme ispromoted or optimized, i.e., a temperature sufficient to increase theactivity of the enzyme in facilitating synthesis of the primer extensionproducts from the hybridized primer and template. The temperature mustactually be sufficient to synthesize an extension product of each primerthat is complementary to each nucleic acid template, but must not be sohigh as to denature each extension product from its complementarytemplate (i.e., the temperature is generally less than about 80° C. to9020 C.).

Depending on the nucleic acid(s) employed, the typical temperatureeffective for this synthesis reaction generally ranges from about 40° C.to 80° C., preferably 50° C. to 75° C. The temperature more preferablyranges from about 65° C. to 75° C. for Thermotoga maritima DNApolymerase. The period of time required for this synthesis may rangefrom about 10 seconds to several minutes or more, depending mainly onthe temperature, the length of the nucleic acid, the enzyme, and thecomplexity of the nucleic acid mixture. The extension time is usuallyabout 30 seconds to a few minutes. If the nucleic acid is longer, alonger time period is generally required for complementary strandsynthesis.

The newly synthesized strand and the complement nucleic acid strand forma double-stranded molecule that is used in the succeeding steps of theamplification process. In the next step, the strands of thedouble-stranded molecule are separated by heat denaturation at atemperature and for a time effective to denature the molecule, but notat a temperature and for a period so long that the thermostable enzymeis completely and irreversibly denatured or inactivated. After thisdenaturation of template, the temperature is decreased to a level thatpromotes hybridization of the primer to the complementarysingle-stranded molecule (template) produced from the previous step, asdescribed above.

After this hybridization step, or concurrently with the hybridizationstep, the temperature is adjusted to a temperature that is effective topromote the activity of the thermostable enzyme to enable synthesis of aprimer extension product using as a template both the newly synthesizedand the original strands. The temperature again must not be so high asto separate (denature) the extension product from its template, asdescribed above. Hybridization may occur during this step, so that theprevious step of cooling after denaturation is not required. In such acase, using simultaneous steps, the preferred temperature range is 50°C. to 70° C.

The heating and cooling steps involved in one cycle of strandseparation, hybridization, and extension product synthesis can berepeated as many times as needed to produce the desired quantity of thespecific nucleic acid sequence. The only limitation is the amount of theprimers, thermostable enzyme, and nucleoside triphosphates present.Usually, from 15 to 30 cycles are completed. For diagnostic detection ofamplified DNA, the number of cycles will depend on the nature of thesample and the initial target concentration in the sample. For example,fewer cycles will be required if the sample being amplified is pure. Ifthe sample is a complex mixture of nucleic acids, more cycles will berequired to amplify the signal sufficiently for detection. For generalamplification and detection, the process is repeated about 15 times.When amplification is used to generate sequences to be detected withlabeled sequence-specific probes and when human genomic DNA is thetarget of amplification, the process is repeated 15 to 30 times toamplify the sequence sufficiently so that a clearly detectable signal isproduced, i.e., so that background noise does not interfere withdetection.

No additional nucleosides, primers, or thermostable enzyme need be addedafter the initial addition, provided that no key reagent has beenexhausted and that the enzyme has not become denatured or irreversiblyinactivated, in which case additional polymerase or other reagent wouldhave to be added for the reaction to continue. Addition of suchmaterials at each step, however, will not adversely affect the reaction.After the appropriate number of cycles has been completed to produce thedesired amount of the specific nucleic acid sequence, the reaction canbe halted in the usual manner, e.g., by inactivating the enzyme byadding EDTA, phenol, SDS, or CHCl₃ or by separating the components ofthe reaction.

The amplification process can be conducted continuously. In oneembodiment of an automated process, the reaction mixture can betemperature cycled such that the temperature is programmed to becontrolled at a certain level for a certain time. One such instrumentfor this purpose is the automated machine for handling the amplificationreaction developed and marketed by Perkin-Elmer Cetus Instruments.Detailed instructions for carrying out PCR with the instrument areavailable upon purchase of the instrument.

Tma DNA polymerase is very useful in the diverse processes in whichamplification of a nucleic acid sequence by the polymerase chainreaction is useful. The amplification method may be utilized to clone aparticular nucleic acid sequence for insertion into a suitableexpression vector, as described in U.S. Pat. No. 4,800, 159. The vectormay be used to transform an appropriate host organism to produce thegene product of the sequence by standard methods of recombinant DNAtechnology. Such cloning may involve direct ligation into a vector usingblunt-end ligation, or use of restriction enzymes to cleave at sitescontained within the primers. Other processes suitable for Tmapolymerase include those described in U.S. Pat. Nos. 4,683,195 and4,683,202 and European Patent Publication Nos. 229,701; 237,362; and258,017; these patents and publications are incorporated herein byreference. In addition, the present enzyme is useful in asymmetric PCR(see Gyllensten and Erlich, 1988, Proc. Natl. Acad. Sci. USA85:7652-7656, incorporated herein by reference); inverse PCR (Ochman etal., 1988, Genetics 129:621, incorporated herein by reference); and forDNA sequencing (see Innis et al., 1988, Proc. Natl. Acad. Sci. USA85:9436-9440, and McConlogue et al., 1988, Nuc. Acids Res. 16(20):9869).Tma polymerase is also believed to have reverse transcriptase activity;see PCT Patent Publication No. 91/09944, published Jul. 11, 1991,incorporated herein by reference.

The reverse transcriptase activity of Tma DNA polymerase permits thisenzyme to be used in methods for transcribing and amplifying RNA. Theimprovement of such methods resides in the use of a single enzyme,whereas previous methods have required more than one enzyme.

The improved methods comprise the steps of: (a) combining an RNAtemplate with a suitable primer under conditions whereby the primer willanneal to the corresponding RNA template; and (b) reverse transcribingthe RNA template by incubating the annealed primer-RNA template mixturewith Tma DNA polymerase under conditions sufficient for the DNApolymerase to catalyze the polymerization of deoxyribonucleosidetriphosphates to form a DNA sequence complementary to the sequence ofthe RNA template.

In another aspect of the above method, the primer that anneals to theRNA template may also be suitable for amplification by PCR. In PCR, asecond primer that is complementary to the reverse transcribed cDNAstrand provides a site for initiation of synthesis of an extensionproduct. As already discussed above, the Tma DNA polymerase is able tocatalyze this extension reaction on a cDNA template.

In the amplification of an RNA molecule by Tma DNA polymerase, the firstextension reaction is reverse transcription, in which a DNA strand isproduced in the form of an RNA/cDNA hybrid molecule. The secondextension reaction, using the DNA strand as a template, produces adouble-stranded DNA molecule. Thus, synthesis of a complementary DNAstrand from an RNA template with Tma DNA polymerase provides thestarting material for amplification by PCR.

When Tma DNA polymerase is used for nucleic acid transcription from anRNA template, the use of buffers that contain Mn²⁺ provide improvedstimulation of Tma reverse transcriptase activity compared to previouslyused, Mg²⁺ containing reverse transcription buffers. Consequently,increased cDNA yields also result from these methods.

As stated above, the product of RNA transcription by Tma DNA polymeraseis an RNA/cDNA hybrid molecule. The RNA is then removed by heatdenaturation or any number of other known methods including alkali,heat, or enzyme treatment. The remaining cDNA strand then serves as atemplate for polymerization of a self-complementary strand, therebyproviding a double-stranded cDNA molecule suitable for amplification orother manipulation. The second strand synthesis requires a sequencespecific primer and Tma DNA polymerase.

Following the synthesis of the second cDNA strand, the resultantdouble-stranded cDNA molecule can serve a number of purposes, includingDNA sequencing, amplification by PCR, or detection of a specific nucleicacid sequence. Specific primers useful for amplification of a segment ofthe cDNA can be added subsequent to the reverse transcription. Also, onecan use a first set of primers to synthesize a specific cDNA moleculeand a second nested set of primers to amplify a desired cDNA segment.All of these reactions are catalyzed by Tma DNA polymerase.

Tma DNA polymerase can also be used to simplify and improve methods fordetection of RNA target molecules in a sample. In these methods, Tma DNApolymerase catalyzes: (a) reverse transcription; (b) second strand cDNAsynthesis; and, if desired (c) amplification by PCR. In addition to theimprovement of only requiring a single enzyme, the use of Tma DNApolymerase in the described methods eliminates the previous requirementof two sets of incubation conditions that were necessary due to the useof different enzymes for each procedural step. The use of Tma DNApolymerase provides RNA transcription and amplification of the resultingcomplementary DNA with enhanced specificity and with fewer steps thanprevious RNA cloning and diagnostic methods. These methods are adaptablefor use in kits for laboratory or clinical analysis.

The RNA that is transcribed and amplified in the above methods can bederived from a number of sources. The RNA template can be containedwithin a nucleic acid preparation from any organism, such as a viral orbacterial nucleic acid preparation. The preparation can contain celldebris and other components, purified total RNA, or purified mRNA. TheRNA template can also be a population of heterogeneous RNA molecules ina sample. Furthermore, the target RNA can be contained in a biologicalsample, and the sample can be a heterogeneous sample in which RNA is buta small portion. Examples of such biological samples include bloodsamples and biopsied tissue samples.

Although the primers used in the reverse transcription step of the abovemethods are generally completely complementary to the RNA template, theprimers need not be completely complementary. As in PCR, not everynucleotide of the primer must anneal to the template for reversetranscription to occur. For example, a non-complementary nucleotidesequence can be present at the 5' end of the primer with the remainderof the primer sequence being complementary to the RNA. Alternatively,non-complementary bases can be interspersed into the primer, providedthat the primer sequence has sufficient complementarity with the RNAtemplate for hybridization to occur and allow synthesis of acomplementary DNA strand.

The following examples are offered by way of illustration only and areby no means intended to limit the scope of the claimed invention. Inthese examples, all percentages are by weight if for solids and byvolume if for liquids, unless otherwise noted, and all temperatures aregiven in degrees Celsius.

EXAMPLE 1 Purification of Thermotoga maritima DNA Polymerase

This example describes the isolation of Tma DNA polymerase fromThermotoga maritima. The DNA polymerase was assayed at various pointsduring purification according to the method described for Taq polymerasewith one modification (1 mM MgCl₂) in Lawyer et al., 1989, J. Biol.Chem. 264( 11 ):6427-6437, incorporated herein by reference.

Typically, this assay is performed in a total volume of 50 μl of areaction mixture composed of 25 mM TAPS-HCl, pH 9.5 (20° C.); 50 mM KCl;1 mM MgCl₂ ; 1 mM β-mercaptoethanol; 200 μM in each of dATP, dGTP, andTTP; 100 μM α-³² P-dCTP (0.03 to 0.07 μCi/nMol); 12.5 μg of activatedsalmon sperm DNA; and polymerase. The reaction is initiated by additionof polymerase in diluent (diluent is composed of 10 mM Tris-HCl, pH 8.0,50 mM KCl, 0.1 mM EDTA, 1 mg/ml autoclaved gelatin, 0.5% NP40, 0.5%Tween 20, and 1 mM β-mercaptoethanol), and the reaction is carried outat 75° C. For the calculations shown below, one assumes that the volumeof the polymerase (and diluent) added is 5 μl, and the total reactionvolume is 50 μl. After a 10 minute incubation, the reaction is stoppedby adding 10 μl of 60 mM EDTA. The reaction mixture is centrifuged, and50 μl of reaction mixture is transferred to 1.0 ml of 50 μg/ml carrierDNA in 2 mM EDTA (at 0° C.). An equal volume (1 ml) of 20% TCA, 2%sodium pyrophosphate is added and mixed. The mixture is incubated at 0°C. for 15 to 20 minutes and then filtered through Whatman GF/C filtersand extensively washed (6×5 ml) with a cold mixture containing 5% TCAand 1% pyrophosphate, followed by a cold 95% ethanol wash. The filtersare then dried and the radioactivity counted. Background (minus enzyme)is usually 0.001% to 0.01% of input cpm. About 50 to 250 pmoles ³²P-dCTP standard is spotted for unit calculation. One unit is equal to 10nmoles dNTP incorporated in 30 minutes at 75° C. Units are calculated asfollows. ##EQU1## The 4.167 factor results from counting only 5/6 (50μl) of the reaction volume after the stop solution is added (60 μl).

All operations were carried out at 0° C. to 4° C. unless otherwisestated. All glassware was baked prior to use, and solutions used in thepurification were autoclaved, if possible, prior to use.

About 50 g of frozen Thermotoga maritima strain MSB8 cells (provided byProf. Dr. K. O. Stetter, Regensburg, Germany) were thawed in 25 ml of3×TE-DTT buffer (150 mM Tris-Cl, pH 7.5, 3 mM EDTA, and 3 mMdithiothreitol) containing 2.4 mM PMSF (from 144 mM stock in DMF) andhomogenized at low speed with a magnetic stirrer. The thawed cells werelysed in an Aminco french pressure cell (8-20,000 psi). The lysate wasdiluted with additional 1×TE-DTT buffer containing fresh 2.4 mM PMSF tofinal 5.5×cell wet weight and sonicated to reduce viscosity (40 to 100%output, 9 min., 50% duty cycle).

The resulting fraction, fraction I (275 ml) contained 5.31 g of proteinand 15.5×10⁴ units of activity. Ammonium sulfate was added to 0.2M (7.25g) and the lysate stirred for 15 minutes on ice. Ammonium sulfateprevents the Tma DNA polymerase from binding to DNA in the crude lysateand reduces ionic interactions of the DNA polymerase with other celllysate proteins.

Empirical testing showed that 0.2% Polymin P (polyethyleneimine, PEI)precipitates >92% of the total nucleic acid. Polymin P (pH 7.5) wasadded slowly to 0.2% (5.49 ml of 10% PEI) and the slurry stirred 30minutes on ice, then centrifuged at 30,000×g at 4° C. for 30 minutes.The supernatant was designated fraction II (246 ml) and contained 3.05 gof protein and 12.5×10⁴ units of activity.

Fraction II was adjusted to 0.3 M ammonium sulfate by addition of 3.24 gsolid ammonium sulfate to ensure complete binding of the DNA polymeraseto phenyl sepharose. Fraction II was then loaded onto a 2.2×6.6 cm (25ml) phenyl sepharose CL-4B (lot OM 08012, purchased from Pharmacia -LKB) column (equilibrated in TE containing 0.3M ammonium sulfate and 0.5mM DTT) at 38 ml/hr (10 ml/cm² /hr). All resins were equilibrated andrecycled according to the manufacturer's recommendations.

The column was washed with 150 ml of the same buffer (A₂₈₀ to baseline),then with 90 ml TE containing 0.5 mM DTT (no ammonium sulfate), followedby a wash with 95 ml of 20% ethylene glycol in TE containing 0.5 mM DTTand finally, eluted with 2 M urea in TE containing 20% ethylene glycoland 0.5 mM DTT. When the column fractions were assayed, a largeproportion of the activity was found in the flow-through and washfractions, indicating that the capacity of the column had been exceeded.Approximately 70% of the DNA polymerase which had bound to this firstphenyl sepharose column eluted at low salt (with the TE-DTT wash), andthe balance of the bound material eluted with 2 M urea in 20% ethyleneglycol in TE-DTT wash.

The flow-through activity from the first phenyl sepharose column wasdesignated PSII load (226 ml) and contained 1.76 g protein. FractionPSII load was applied to a second phenyl sepharose column (of the samelot and dimensions), and the run was repeated the same way. Again, thecapacity of the column was exceeded, and activity was found to elutewith both the low salt and 2M urea washes. Only 10% of the bound DNApolymerase eluted with the TE-DTT wash; the major portion (-90%) elutedwith the 2M urea in 20% ethylene glycol in TE-DTT wash.

The flow-through activity from the second phenyl sepharose column wascombined with the TE-DTT eluates from the first and second phenylsepharose columns and adjusted to 0.3M ammonium sulfate. This fraction(PSIII load, 259.4 ml) contained 831 mg protein and was reapplied to athird phenyl sepharose column of 50 ml bed volume at 10 ml/cm² /hr. Thistime, all of the applied activity was retained by the column and onlyeluted with the 2M urea in 20% ethylene glycol in TE-DTT wash.

All three urea eluates were separately concentrated ˜3 to 4 fold onAmicon YM30 membranes and dialyzed into heparin sepharose loading buffershortly after elution to avoid prolonged exposure to urea (to avoidcarbamylation). The dialyzed and concentrated urea eluates were assayedfor protein concentration and were found to vary greatly in theirspecific activity. Because the urea eluate from the second phenylsepharose column contained the majority of the activity at significantlyhigher specific activity (˜8×10⁴ units of activity at ˜1000 units/mgprotein) than the other two eluates, it was processed separately fromthem.

The dialyzed and concentrated phenyl sepharose II urea eluate wasapplied to a 5 ml bed volume heparin sepharose CL 6B (purchased fromPharmacia - LKB) column that had been equilibrated with 0.08M KCl, 50 mMTris-Cl, pH 7.5, 0.1 mM EDTA, 0.2% Tween 20, and 0.5 mM DTT. This columnand all subsequent columns were run at 1 bed volume per hr. All of theapplied DNA polymerase activity was retained by the column. The columnwas washed with 17 ml of the same buffer (A₂₈₀ to baseline) and elutedwith 60 ml of a linear 80 to 500 mM KCl gradient in the same buffer.

Fractions (0.53 ml) eluting between 0.21 and 0.315 M KCl were analyzedby SDS-PAGE. The peak fractions eluting between 0.225 and 0.275 M KClwere pooled separately. The flanking fractions were kept to be combinedlater with other fractions. The pool of peak fractions (affigel I load)was diluted with affigel-blue buffer without KCl to reduce its ionicstrength to 0.15M KCl.

The affigel I load fraction contained 3.4 mg of protein and was appliedto a 4.3 ml affigel-blue (purchased from BioRad) column, which had beenequilibrated in 25 mM Tris-Cl, pH 7.5, 0.1 mM EDTA, 0.2% Tween 20, 0.5mM DTT, and 0.15M KCl. All of the applied Tma DNA polymerase wasretained. The column was washed with 15 ml of the same buffer and elutedwith a 66 ml linear0.15 to 0.7M KCl gradient in the same buffer.

Fractions (0.58 ml) eluting between 0.34 and 0.55 M KCl were analyzed bySDS-PAGE and appeared to be >90% pure. The polymerase peak fractionswere no longer contaminated with site-specific endonuclease (indicatedby absence of lower-molecular-weight specific DNA fragments after one ortwenty-two hours incubation at 65° C. with 2 units of Tma polymeraseusing 600 ng of plasmid pLSG 1 (ccc-DNA)). The polymerase peak fractionseluting between 0.3 and 0.55M were pooled and concentrated ˜20-fold onan Amicon YM 30 membrane. This fraction was then diafiltered into2.5×storage buffer (50 mM Tris-Cl, pH 7.5,250 mM KCl, 0.25 mM EDTA, 2.5mM DTT, and 0.5% Tween 20 [Pierce, Surfact-Amps]) and stored at 4° C.

The urea eluates from the firs/and third phenyl sepharose columns werecombined with the flanking fractions from the first heparin sepharosecolumn. This pool (HSII load) contained ˜200 mg protein and was dilutedwith heparin sepharose buffer without KCl to adjust its ionic strengthto 80 mM KCl. HSII load was applied to a 16 ml bed volume heparinsepharose column (equilibrated in 80 mM KCl, 50 mM Tris-Cl, pH 7.5, 0.1mM EDTA, 0.2% Tween 20, and 0.5 mM DTT). No detectable polymeraseactivity appeared in the flow-through fractions.

The column was washed with 80 ml of the same buffer and eluted with a200 ml linear 80 to 750 mM KCl gradient in the same buffer. Fractions (2ml) eluting between 0.225 and 0.335M KCl were combined, concentrated˜5-fold on an Amicon YM 30 membrane, and dialyzed intohydroxyapatite-buffer. This fraction (HA load) contained 9.3 mg proteinand was loaded onto a 4 ml bed volume hydroxyapatite (high resolutionHPT, purchased from Calbiochem) column that had been equilibrated in 10mM potassium phosphate buffer, pH 7.5, 0.5 mM DTT, 0.1 mM EDTA, and 0.2%Tween 20. All of the applied DNA polymerase activity was retained by thecolumn.

The column was washed with 12 ml of the same buffer and eluted with a 60ml linear 10 to 500 mM potassium phosphate (pH 7.5) gradient. Fractions(0.8 ml) eluting between 0.105 and 0.230M potassium phosphate wereanalyzed by SDS-PAGE. Compared to the affigel column I load fraction(which by SDS-PAGE appeared to be ˜10 to 20% pure), these fractions were˜5-fold less pure. The DNA polymerase peak fractions eluting between0.105 and 0.255M potassium phosphate were combined, concentrated ˜3-foldon an Amicon YM 30 membrane, and diafiltered into affigel-blue buffer.

The affigel II load fraction was applied to a 3 ml bed volumeaffigel-blue column that had been equilibrated in affigel-blue buffer.No detectable DNA polymerase activity appeared in the flow-throughfractions. The column was washed with 9 ml of the same buffer and elutedwith a 50 ml linear 0.2 to 0.7M KCl gradient in the same buffer.Fractions (0.58 ml) eluting between 0.33 and 0.505M KCl were analyzed bySDS-PAGE. Because the earlier eluting fractions looked slightly cleanerby their silver staining pattern, two pools were made. Fractions elutingbetween 0.31 and 0.4M KCl were combined into pool I; fractions elutingbetween 0.4 and 0.515M KCl were combined into pool II. The two poolswere each separately concentrated ˜7-fold on an Amicon YM 30 membrane.

All three affigel-blue pools still contained high levels ofcontaminating, nonspecific nucleases. Upon incubation at 70° C. with 1.5units of DNA polymerase, both a single-strand M13 DNA template and amultifragment restriction digest of a plasmid were degraded within a fewhours. In situ-activity gels were run and showed that the DNA polymerasefractions had not suffered proteolytic degradation.

The two pools from the second affigel-blue column were combined anddialyzed into a phosphocellulose column buffer. The dialyzed fraction(Pll I load) was loaded onto a 3 ml phosphocellulose column, which hadbeen washed overnight with 25 mM Tris-Cl, pH 7.5, 50 mM KCl, 0.1 mMEDTA, 0.2% Tween 20, and 0.5 mM DTT. This wash later proved to have beeninsufficient to equilibrate the pH of the phosphocellulose resin.Unfortunately, this was discovered after the sample had been loaded ontothe column. All of the applied activity bound to the column.

The column was washed with 9 ml of loading buffer and eluted with a 45ml linear 50 to 700 mM KCl gradient. DNA polymerase peak fractions (0.58ml) eluting between 0.46 and 0.575M KCl were analyzed by SDS-PAGE.

Separation of contaminating proteins was observed throughout the peak: a˜45 kDa contaminating band elutes at 0.53M KCl; an ˜85 kDa contaminatingband has an elution peak at 0.54M KCl. Therefore, this column wasrepeated (loading at somewhat higher ionic strength considering theelution profile of the polymerase). The peak fractions, eluting between0.475 and 0.56 M KCl from the first phosphocellulose column werecombined with the pool from the first affigel column. The combinedfraction (Pll II load) now contained all of the purified polymerase(˜7.5×10⁴ units).

Fraction Pll II load was diluted with phosphocellulose buffer to adjustits ionic strength to 0.2M KCl. Pll II load was loaded onto a 9 ml bedvolume phosphocellulose column, which, this time, had been equilibratedto the correct pH and ionic strength of 25 mM Tris-Cl, pH 7.5, 200 mMKCl, 0.1 mM EDTA, 0.2% Tween 20, and 0.5 mM DTT. The column was washedwith 27 ml of the same buffer and was intended to be eluted with a 140ml linear 0.2 to 0.8M KCl gradient. However, instead of an upper limitbuffer of 0.8M KCl, the buffer had a concentration of 52 mM KCl whichresulted in a gradient decreasing in salt. The column was thenreequilibrated with 32 ml of 0.2M KCl-phosphocellulose buffer, and the140 ml linear 0.2 to 0.8M KCl gradient was reapplied.

The routine assays of flow-through, wash, and gradient fractions showedthat, at this higher pH (pH 7.5), the DNA polymerase does not bind tothe phosphocellulose resin at 0.2M KCl. The DNA polymerase activitycontaining fractions from the flow-through, wash, and decreasingsalt-gradient-fractions were combined. The resulting pool wasconcentrated on an Amicon YM30 membrane. However, a mishap with theconcentrator led to further losses of DNA polymerase activity. Therecovered activity was dialyzed into phosphocellulose buffer with 50 mMKCl and designated Pll III load.

This fraction was loaded onto a 5 ml bed volume phosphocellulose columnthat had been equilibrated with phosphocellulose buffer with 50 mM KCl.All of the applied activity was retained by the column. The column waswashed with 15 ml of the same buffer and eluted with a 45 ml linear 50to 500 mM KCl gradient in the same buffer. Fractions (0.87 ml) elutingbetween 0.16 and 0.33M KCl were analyzed by SDS-PAGE and in situactivity gels.

Based on the silver staining pattern, two pools were made. The peakfractions, eluting between 0.215 and 0.31M KCl, were kept separate fromthe leading and trailing fractions, which were combined into aside-fractions pool. Both pools were concentrated on centricon 30membranes and diafiltered into 2.5×storage buffer (50 mM Tris-HCl, pH7.5, 250 mM KCl, 0.25 mM EDTA, 2.5 mM DTT, and 0.5% Tween 20 [Pierce,Surfact-Amps]) and subsequently mixed with 1.5 volumes of 80% glycerol.

About 3.1×10⁴ units were recovered in the peak fraction; the side poolyields an additional 1×10³ units of activity. The purified DNApolymerase was undegraded as evidenced by an unchanged migration patternin an in situ activity gel. The molecular weight as determined by gelelectrophoresis of the purified DNA polymerase is approximately 97 kDa.Tma DNA polymerase is recognized by epitope-specific antibodies thatcorrespond to Taq DNA polymerase amino acid residues number 569 through587 (DGTP1) and 718 through 732 (DGTP3).

EXAMPLE 2 Isolation of DNA Encoding Tma DNA Polymerase I Activity

Synthetic oligodeoxyribonucleotides DG 164 through DG 167 are fourdifferent 16-fold degenerate (each) 22mer pools designed as "forward"primers to one of the motifs in the template binding domains (3'-most 14nucleotides)of thermostable DNA polymerases. This motif is the aminoacid sequence Gly-Tyr-Val-Glu-Thr and corresponds identically to the T.aquaticus (Taq) DNA polymerase amino acids 718 through 722 and to the T.thermophilus (Tth) DNA polymerase amino acids 720 through 724. Thismotif is found in a DNA polymerase gene in all Thermus species. Thecombined primer pool is 64-fold degenerate, and the primers encode aBglII recognition sequence at their 5'-ends.

Forward primers DG164 through DG 167 are shown below:

    ______________________________________                                        DG164 SEQ ID NO: 5'CGAGATCTGGNTAYGTWGAAAC                                           2                                                                       DG165 SEQ ID NO: 5'CGAGATCTGGNTAYGTWGAGAC                                           3                                                                       DG166 SEQ ID NO: 5'CGAGATCTGGNTAYGTSGAAAC                                           4                                                                       DG167 SEQ ID NO: 5'CGAGATCTGGNTAYGTSGAGAC                                           5                                                                       ______________________________________                                    

In these forward primers: A is Adenine; C is Cytidine; G is Guanidine; Tis Thymine; Y is C+T (pyrimidine); S is G+C (Strong interaction; 3H-bonds); W is A+T (Weak interaction; 2 H-bonds); and N is A+C+G+T(aNy).

Synthetic oligodeoxyribonucleotides DG 160 through DG 163 are fourdifferent 8-fold degenerate (each) 20mer pools designed as "reverse"primers to one of the motifs in the template binding domains (Y-most 14nucleotides) of thermostable DNA polymerases. These primers are designedto complement the (+)-strand DNA sequence that encodes the motifGln-Val-His-Asp-Glu and that corresponds identically to the Taq DNApolymerase amino acids 782 through 786 and to the Tth DNA polymeraseamino acids 784 through 788. This motif is found in a DNA polymerasegene in all Thermus species. The combined primer pool is 32-folddegenerate, and the primers encode an EcoRI recognition sequence attheir 5'-ends.

Reverse primers DG 160 through 163 are shown below:

    ______________________________________                                        DG160 SEQ ID NO: 6 5'CGGAATTCRTCRTGWACCTG                                     DG161 SEQ ID NO: 7 5'CGGAATTCRTCRTGWACTTG                                     DG162 SEQ ID NO: 8 5'CGGAA17CRTCRTGSACCTG                                     DG163 SEQ ID NO: 9 5'CGGAATTCRTCRTGSACTTG                                     ______________________________________                                    

In these reverse primers A, C, G, T, S, and W are as defined above, andR is G+A (puRine).

To amplify an ˜230 bp fragment of the Tma DNA polymerase gene, a PCRamplification tube was prepared without MgCl₂ that contained in 80 μl:(1) 5 ng denatured Tma genomic DNA; (2) 50 pmoles (total) of thecombined forward primer set DG164-DG167; (3) 50 pmoles (total) of thecombined reverse primer set DG160-DG163; (4) 2 units Taq DNA polymerase;(5) 50 μM each (final) dNTP; (6) 0.05% Laureth-12; and (7) standard PCRbuffer except no magnesium chloride.

The sample was flash-frozen at -70° C. and then stored at -20° C. Thefrozen sample was carefully layered with 20 μl of 10mM MgCl₂ (finalconcentration 2 mM), immediately overlayed with 50 μl of mineral oil,and cycled in a Perkin Elmer Cetus Thermal Cycler according to thefollowing file: (1) step to 98° C.--hold 50 seconds; (2) step to 50°C.--hold 10 seconds; (3) ramp to 75° C. over 4 minutes; and (4) step to98° C. The file was repeated for a total of 30 cycles. One-fifth (20 μl)of the amplification product was purified on a 3% Nusieve/1% Seakemagarose composite gel, and the approximately 230 bp fragment was eluted,concentrated, and digested with BglII and EcoRI.

Synthetic oligodeoxyribonucleotides DG 154 and DG 155 are two different32-fold degenerate (each) 19 mer pools designed as "forward" primers toone of the motifs in the primer:template binding domains (3'-most 11nucleotides) of thermostable DNA polymerases. This motif is thetetrapeptide sequence Thr-Ala-Thr-Gly and corresponds identically to theTaq DNA polymerase amino acids; 569 through 572 and to Tth DNApolymerase amino acids 57 1 through 574. This motif is found in a DNApolymerase gene in all Thermus species. The combined primer pool is64-fold degenerate and the primers encode a BgllI recognition sequenceat their 5'-ends.

Forward primers DG154 and DG155 are presented below:

    ______________________________________                                        DG154  SEQ ID NO: 10                                                                              CGAGATCTACNGCNACWGG                                       DG155  SEQ ID NO: 11                                                                              CGAGATCTACNGCNACSGG                                       ______________________________________                                    

In these forward primers, A, C, G, T, S, W, and N are as defined above.

To amplify an approximately ˜667bp fragment of the Tma DNA polymerasegene, a PCR amplification tube was prepared without MgCl₂ thatcontained, in 80 μl: (1) 5 ng denatured Tma genomic DNA; (2) 50 pmoles(total) of the combined forward primer set DG154-DG 155; (3) 50 pmoles(total) of the combined reverse primer set DG160-DG163; (4) 2 Units ofTaq DNA polymerase; (5) 50 μM each (final) dNTP; (6) 0.05% Laureth 12;and (7) standard PCR buffer except no magnesium chloride.

The sample was flash-frozen at -70° C. and then stored at -20° C. Thefrozen sample was carefully layered with 20 μl of 10 mM MgCl₂ (finalconcentration 2 mM), immediately overlayed with 50 μl of mineral oil,and cycled in a Perkin Elmer Cetus Thermal Cycler according to thefollowing file: (1) step to 98° C.--hold 50 seconds; (2) step to 55°C.--hold 10 seconds; (3) ramp to 75° C. over 4 minutes; (4) step to 98°C. The file was repeated for a total of 30 cycles.

One-fifth (20 μl) of the amplification product was purified on a 1.5%agarose gel, and the approximately 670 bp fragment was eluted,concentrated, and digested with BglII and EcoRI as above.

These amplification reactions yielded a 667 bp fragment and a 230 bpfragment, which was a subfragment of the 667 bp fragment. Thesefragments proved useful in obtaining the complete coding sequence forthe Tma DNA polymerase I gene, as described in the following example.

EXAMPLE 3 Cloning the Thermotoga maritima (Tma) DNA Polymerase I Gene

This Example describes the strategy and methodology for cloning the TmaDNA polymerase I (Tma Pol I) gene of Thermotoga maritima.

The DNA sequences of the PCR products generated with primers DG 164-167and DG160-163 (230 bp) and DG154, 155 and DG160-163 (667 bp) contain anXmaI restriction site recognition sequence, 5'CCCGGG. Oligonucleotideswere designed to hybridize to sequences upstream and downstream of theXmaI site. DG224 is a 21 mer, homologous to the PCR products 59-79 bp3'-distal to the XmaI site. DG225 is a 22 mer, homologous to the PCRproducts from the XmaI site to 21 bp upstream (5') of the XmaI site. Thesequence of DG224 and of DG225 is shown below (K is G or T).

    ______________________________________                                        DG224 SEQ ID NO: 5'ACAGCAGCKGATATAATAAAG                                            12                                                                      DG225 SEQ ID NO: 5'GCCATGAGCTGTGGTATGTCTC                                           13                                                                      ______________________________________                                    

DG224 and DG225 were labelled by tailing with biotin-dUTP and terminaltransferase in reactions designed to add approximately 8 biotin-dUTPresidues to the 3'-end of oligonucleotides. These labelledoligonucleotides were used as probes in Southern blot analyses ofrestriction digests of genomic Tma DNA. A preliminary restriction mapwas generated based on the Southern analysis results, and the DNAsequences of the PCR products that were generated as described inExample 2.

The preliminary map showed that the entire Tma DNA polymerase gene iscontained in two XmaI fragments. Most of the gene, including the 5'-end,resides on an approximately 2.6 kb XmaI fragment. The remainder of thegene (and the 3'-end) resides on an approximately 4.2 kb XmaI fragment.The two XmaI fragments containing the entire Tma DNA polymerase genewere cloned into plasmid pBS13+ (also called pBSM13+) as describedbelow.

About 40 micrograms of Tma genomic DNA were digested to completion withXmaI. The XmaI digest was size-fractionated via electroelution. Slotblot analyses of a small portion of each fraction, using γ-³²P-ATP-kinased DG224 and DG225 probes, identified the fractionscontaining the 4.2 kb 3'-fragment (hybridizing with DG224) and the 2.6kb 5'-fragment (hybridizing with DG225). Fractions were concentrated viaethanol precipitation and then ligated with XmaI-digested pBS 13+(Stratagene). Ampicillin-resistant transformants were selected onnitrocellulose filters and the filters probed with γ-³² P-ATP-kinasedDG224 or DG225 probe as appropriate. Plasmid DNA was isolated fromcolonies that hybridized with probe. Restriction analysis was performedto confirm that fragments were as expected and to determine orientationof fragments relative to the pBS13+ vector.

DNA sequence analysis of the cloned fragments was performed using the"universal" and "reverse" sequencing primers (which prime in the vector,outside the restriction site polylinker region). In addition, for5'-clones, the primers used to determine the DNA sequence of theDG154-155/DG160-163 667 bp PCR clone were employed. Preliminary DNAsequence analysis confirmed that the desired DNA fragments containingthe ..Tma DNA polymerase gene had been cloned.

From the preliminary DNA sequence, further sequencing primers weredesigned to obtain DNA sequence of more internal regions of thefragments. In addition, to facilitate DNA sequence analysis, severaldeletions of the two XmaI fragments were made. For both orientations ofthe 2.6 kb 5'-fragment, EcoRI, SacI, and XbaI digests were each dilutedand ligated under conditions that favored intramolecular ligation, thusdeleting DNA between the vector EcoRI, SacI, and XbaI sites and thecorresponding sites in the Tma XmaI fragment. Such internal deletionsallow ready DNA sequence analysis using the "universal" or "reverse"sequencing primers.

Similarly, a deletion of the 4.2 kb 3'-fragment was made, fusing theBamHI site of the vector with the BglII site approximately 650 bp fromthe Tma Pol I internal XmaI site in that clone (BamHI and BglII haveidentical GATC cohesive ends that ligate readily with one another). Thisdeletion allows for DNA sequence analysis of the 3'-end of the Tma Pol Igene.

Restriction site analysis reveals that both the 2.6 kb 5'-fragment andthe 4.2 kb 3'-fragment lack NcoI, NdeI, and AseI restriction sites.Knowing the ATG start and coding sequence of the Tma Pol I gene, one candesign oligonucleotides that will alter the DNA sequence at the ATGstart to include an NcoI, NdeI, or AseI restriction site viaoligonucleotide site-directed mutagenesis. In addition, the mutagenicoligonucleotides can be designed such that a deletion of sequencesbetween the lac promoter in the pBS 13+vector and the beginning of theTma Pol I gene is made concurrent with the inclusion of an NdeI or AseIrecognition sequence at the ATG start.

The deletion of sequences between the lac promoter in the vector andstart of the Tma Pol I gene would also eliminate the XmaI restrictionsite in the deleted region, thus making it convenient to assemble theentire coding sequence in an expression plasmid using conventional skillin the art (see, e.g., synthesis protocols for pDG 174 - pDG 181 incopending Ser. No. 455,967, filed Dec. 22, 1989, incorporated herein byreference, and Example 5).

EXAMPLE 4 PCR With Tma DNA Polymerase

About 1.25 units of the Tma DNA polymerase purified in Example 1 is usedto amplify rRNA sequences from Tth genomic DNA. The reaction volume is50 μl, and the reaction mixture contains 50 pmol of primer DG73, 10⁵ to10⁶ copies of the Tth genome (˜2×10⁵ copies of genome/ng DNA), 50 pmolof primer DG74, 200 μM of each dNTP, 2 mM MgCl₂, 10 mM Tris-HCl, pH 8.3,50 mM KC1, and 100 μg/ml gelatin (optionally, gelatin may be omitted).

The reaction is carded out on a Perkin-Elmer Cetus Instruments DNAThermal Cycler. Twenty to thirty cycles of 96° C. for 15 seconds; 50° C.for 30 seconds, and 75° C. for 30 seconds are carded out. At 20 cycles,the amplification product (160 bp in size) can be faintly seen on anethidium bromide stained gel, and at 30 cycles, the product is readilyvisible (under UV light) on the ethidium bromide stained gel.

The PCR may yield fewer non-specific products if fewer units of Tma DNApolymerase are used (i.e., 0.31 units/50 μl reaction). Furthermore, theaddition of a non-ionic detergent, such as laureth-12, to the reactionmixture to a final concentration of about 0.5% to 1% can improve theyield of PCR product.

Primers DG73 and DG74 are shown below:

    ______________________________________                                        DG73  SEQ ID NO: 14                                                                              5'TACGTTCCCGGGCCTTGTAC                                     DG74  SEQ ID NO: 15                                                                              5'AGGAGGTGATCCAACCGCA                                      ______________________________________                                    

EXAMPLE 5 Recombinant Expression Vectors for Tma DNA Polymerase

A. Mutagenesis of the 5' and 3' Ends of the Tma Pol I Gene

The 5'end of the Tma gene in vector pBS:Tma7-1 (ATCC No. 6847 1, laterrenamed pTma01) was mutagenized with oligonucleotides DG240 and DG244via oligonucleotide site-directed mutagenesis. Plasmid pBS:Tma7-1consists of the 2.6 kb 5' XmaI fragment cloned into vector pBS 13+.Resultant mutants from both mutageneses had deletions between the ATG ofβ3-galactosidase in the pBS+ vector and the ATG of Tma Pol I so that theTma coding sequence was positioned for expression utilizing the vectorlac promoter, operator, and ribosome binding site (RBS). Both sets ofmutants also had alterations in the second and sixth codons for Tma PolI to be more compatible with the codon usage of E. coli without changingthe amino acid sequence of the encoded protein. In addition, DG240placed an NdeI restriction site at the ATG start of the coding sequence(5'CATATG), and DG244 placed an NcoI restriction site at the ATG startof the coding sequence (5' CCATGG). DG240 mutant candidate colonies werescreened with [γ³² P]-labelled oligonucleotide DG241, and DG244 mutantcandidate colonies were screened with [γ³² P]-labelled oligonucleotideDG245. Plasmid DNA was isolated from colonies that hybridized with theappropriate probes, and mutations were confirmed via restrictionanalysis and DNA sequence analysis. The DG240 mutant was namedpTma5'Nde#3 and later renamed pTma06. The DG244 mutant was namedpTma5'Nco#9 and later renamed pTma07.

The 3'-end of the Tma Pol I gene was mutagenized in pBSTma3'11-1 Bam/Bgl(ATCC No. 68472, later renamed pTma04) with mutagenic oligonucleotideDG238. Plasmid pBSTma3'11-1 Bam/Bgl was constructed as described inExample 3 by cloning the 4.2 kb 3' XmaI fragment into pBS 13+, digestingthe resulting plasmid with BamHI and BgllI, and circularizing byligation the large fragment from the digestion. DG238 inserts EcoRV andBamHI sites immediately downstream of the TGA stop codon. Mutant colonycandidates were identified with [65 ³² P]-labelled oligonucleotideDG239. Plasmid DNA isolated from positive colonies was screened forappropriate restriction digest patterns, and the DNA sequence wasconfirmed. One correct plasmid obtained was designated as pTma3'mut#1and later renamed pTma05.

B. Assembling the Full-Length Gene in a lac Promoter Vector

For purposes of studying low level expression of Tma Pol I in E. coliand possible complementation of E. coli polymerase mutants by Tma Pol I(where high level expression might kill the cell, but where low levelexpression might rescue or complement), the Tma Pol I gene was assembledin the pBS13+ cloning vector. An ˜300 bp XmaI to EcoRV fragment frompTma3'mut#1 was isolated and purified, following agarose gelelectrophoresis and ethidium bromide staining, by excising an agarosegel slice containing the ˜300 bp fragment and freezing in a Costarspinex filter unit. Upon thawing, the unit was spun in a microfuge, andthe liquid containing the DNA fragment was collected. After ethanolprecipitation, the fragment was ligated with each of the two 5'-mutatedvectors, pTma5'Nde#3 and pTma5'Nco#9, which had each been digested withAsp718, repaired with Klenow and all 4 dNTPs (the reaction conditionsare 56 mM Tris-Cl, pH 8.0, 56 mM NaCl, 6 mM MgCl₂, 6mM DTT, 5 μM dNTPs,and 11 units of Klenow at 37° C. for 15 minutes; then inactivate at 75°C. for 10 minutes), and then further digested with XmaI.

The ligation was carried out in two steps. To ligate the XmaI stickyends, the conditions were 20 μg/ml total DNA, 20 mM Tris-Cl, pH 7.4, 50mM NaCl, 10 mM MgCl₂, 40 μM ATP, and 0.2 Weiss units T4 DNA ligase per20 μl reaction at 0° C. overnight. To ligate Asp718-digested, Klenowrepaired blunt ends with EcoRV-digested blunt ends, the first ligationsare diluted 4 to 5 fold and incubated at 15° C. in the same ligationbuffer, except 1 mM ATP and 10 Weiss units of T4 DNA ligase are used per20 μl reaction. Ligations were transformed into DG101 host cells.Candidates were screened for appropriate restriction sites, and the DNAsequences around the cloning sites was confirmed. The desired plasmidswere designated pTma08 (NdeI site at ATG) and pTma09 (NcoI site at ATG).

C. Assembling the Full-Length Gene in P_(L) Expression Vectors

The following table describes Pt. promoter expression vectors used forassembling and expressing full-length Tma Pol I under the control ofλP_(L) promoter.

    __________________________________________________________________________                           Oligonucleotide Duplexes                               Vector                                                                             Site at ATG                                                                          RBS*                                                                              AsuII+/-**                                                                           Cloned into pDG160 or pDG161                                                                  Amp/Tet***                             __________________________________________________________________________    pDG174                                                                              ##STR1##                                                                            T7  -      DG106/DG107     Amp                                    pDG178                                                                              ##STR2##                                                                            N   -      DG110/DG111     Amp                                    pDG182                                                                              ##STR3##                                                                            T7  +      FL42/FL43       Amp                                    pDG184                                                                              ##STR4##                                                                            N   +      FL44/FL45       Amp                                    pDG185                                                                              ##STR5##                                                                            N   +      FL44/FL45       Tet                                    __________________________________________________________________________     *RBS  Phage T7 gene 10 or lambda gene N ribosome bind site.                   ##STR6##                                                                      ***Antibiotic resistance determinant ampicillin or tetracycline.    The       five vectors in the table are derivatives of plasmid pDG 160, if     ampicillin resistant, or pDG161, if tetracycline resistant. Plasmids     pDG160 and pDG161 and the scheme for constructing vectors similar to the     pDG vectors shown in the table are described in Ser. No. 455,967, filed     Dec. 22, 1989, incorporated herein by reference. The vectors confer     ampicillin or tetracycline resistance and all contain the δ-toxin     positive retroregulator from Bacillus thuringiensis and the same point     mutations in the RNA II gene that render the plasmids temperature     sensitive for copy number.

The probes and oligonucleotides described in the Table are shown below.

    __________________________________________________________________________    DG240                                                                             SEQ ID NO: 16                                                             5'CCATCAAAAAGAAATAGTCTAGCCATATGTGTTTCCTGTGTGAAATTG                            DG241                                                                             SEQ ID NO: 17                                                                          5'AAACACATATGGCTAGAC                                             DG244                                                                             SEQ ID NO: 18                                                             5'CCATCAAAAAGAAATAGTCTAGCCATGGTTGTTTCCTGTGTGAAATTG                            DG245                                                                             SEQ ID NO: 19                                                                          5'AAACAACCATGGCTAGAC                                             DG238                                                                             SEQ ID NO: 20                                                             5'GCAAAACATGGTCGTGATATCGGATCCGGAGGTGTTATCTGTGG                                DG239                                                                             SEQ ID NO: 21                                                                          5'CCGATATCACGACCATG                                              DG106                                                                             SEQ ID NO: 22                                                                          5'CCGGAAGAAGGAGATATACATATGAGCT                                   DG107                                                                             SEQ ID NO: 23                                                                          5'CATATGTATATCTCCTTCTT                                           DG110                                                                             SEQ ID NO: 24                                                                          5'CCGGAGGAGAAAACATATGAGCT                                        DG111                                                                             SEQ ID NO: 25                                                                          5'CATATGTTTTCTCCT                                                FL42                                                                              SEQ ID NO: 26                                                                          5'CCGGAAGAAGGAGAAAATACCATGGGCCCGGTAC                             FL43                                                                              SEQ ID NO: 27                                                                          5'CGGGCCCATGGTATTTTCTCCTTCTT                                     FL44                                                                              SEQ ID NO: 28                                                                          5'CCGGAGGAGAAAATCCATGGGCCCGGTAC                                  FL45                                                                              SEQ ID NO: 29                                                                          5'CGGGCCCATGGATTTTCTCCT                                          __________________________________________________________________________

A three-fragment ligation was used to assemble the Tma Pol I gene in thevectors. The vectors are digested with SmaI and either NdeI (pDG174,pDG178) or NcoI (pDG 182, pDG 184, pDG 185). The 5' end of the Tma Pol Igene is from pTma5'Nde#3 digested with NdeI and XmaI or pTma5'Nco#9digested with NcoI and XmaI. The 3' end of the gene is from pTma3'mut#1digested with XmaI and EcoRV and the ˜300 bp fragment purified asdescribed above.

The plasmid pDG 182 shown in the Table and the scheme above were used toconstruct expression vector pTma 1 3. The plasmid pDG184 and the schemeabove were used to construct expression vectors pTma12-1 and pTma12-3.Plasmid pTma12-3 differs from pTma12-1 in that pTma12-3 is a dimer ofpTma12-1 produced during the same ligation/transformation protocol. Theplasmid pDG 185 and the scheme shown above were used to constructexpression vector pTMa11.

Even though a vector may contain the entire polymerase coding sequence,a shortened form of the enzyme can be expressed either exclusively or incombination with a full length polymerase. These shortened forms of TmaDNA polymerase result from translation initiation occurring at one ofthe methionine (ATG) codons in the coding sequence other than the5'-ATG. The monomeric pTma12-1 plasmid produces, upon heat induction,predominantly a biologically active thermostable DNA polymerase lackingamino acids 1 through 139 of native Tma DNA polymerase. Thisapproximately 86kDa protein is the result of translation initiation atthe methionine codon at position 140 of the Tma coding sequence and iscalled MET140.

In shake flask studies under the appropriate conditions (heat inductionat 34° C. or 36° C., but not 38° C.), the multimeric pTma12-3 expressionvector yielded a significant level of "full length" Tma DNA polymerase(approximately 97 kDa by SDS-PAGE) and a smaller amount of the shortened(approximately 86 kDa) form resulting from translation initiation at Met140. Amino-acid sequencing of the full length Tma DNA polymeraseindicated that the amino-terminal methionine was removed and thesecond-position alanine was present at the N-terminus.

Recombinant Tma DNA Polymerase was purified from E. coli strain DG 116containing plasmid pTma12-3. The seed flask for a 10 L fermentationcontained tryprone (20 g/l), yeast extract (10 g/l), NaCl (10 g/l),ampicillin (100 mg/1), and thiamine (10 mg/l). The seed flask wasinnoculated with a colony from an agar plate (a frozen glycerol culturecan be used). The seed flask was grown at 30° C. to between 0.5 to2.00.D. (A₆₈₀). The volume of seed culture inoculated into the fermentoris calculated such that the bacterial concentration is 0.5 mg dryweight/liter. The 12.5 liter growth medium contained 60 mM K₂ HPO₄, 16mM NaNH₄ HPO₄, 10 mM citric acid, and 1 mM MgSO₄. The following sterilecomponents were added: 2 g/l glucose, 10 mg/l thiamine, 2.5 g/l casaminoacids, 100 mg/l ampicillin, and 100 mg/l methicillin. Foaming wascontrolled by the addition of propylene glycol as necessary, as anantifoaming agent. Airflow was maintained at 2 l/min.

The fermentor was inoculated as described above, and the culture wasgrown at 30° C. for 4.5 hours to a cell density (A₆₈₀) of 0.7. Thegrowth temperature was shifted to 35° C. to induce the synthesis ofrecombinant Tma DNA polymerase. The temperature shift increases the copynumber of the pTma12-3 plasmid and simultaneously derepresses the lambdaP_(L) promoter controlling transcription of the modified Tma DNApolymerase gene through inactivation of the temperature-sensitive cIrepressor encoded by the defective prophage lysogen in the host. Thecells were grown for 21 hours to an optical density of 4 (A₆₈₀) andharvested by centrifugation. The resulting cell paste was stored at -70°C.

Recombinant Tma DNA polymerase is purified as in Example 6, below.Briefly, cells are thawed in 1 volume of TE buffer (50 mM Tris-Cl, pH7.5, and 1.0 mM EDTA with 1 mM DTT), and protease inhibitors are added(PMSF to 2.4 mM, leupeptin to 1 μg/ml, and TLCK to 0.2 mM). The cellsare lysed in an Aminco french pressure cell at 20,000 psi and sonicatedto reduce viscosity. The sonicate is diluted with TE buffer and proteaseinhibitors to 5.5 X wet weight cell mass (Fraction I), adjusted to 0.3Mammonium sulfate, and brought rapidly to 75° C. and maintained at 75° C.for 15 min. The heat-treated supernatant is chilled rapidly to 0° C.,and the E. coli cell membranes and dentaured proteins are removedfollowing centrifugation at 20,000 X G for 30 min. The supernatantcontaining Tma DNA polymerase (Fraction II) is saved. The level ofPolymin P necessary to precipitate > 95% of the nucleic acids isdetermined by trial precipitation (usually in the range of 0.6 to 1%w/v). The desired amount of Polymin P is added slowly with rapidstirring at 0° C. for 30 min. and the suspension centrifuged at 20,000 XG for 30 min. to remove the precipitated nucleic acids. The supernatant(Fraction III) containing the Tma DNA polymerase is saved.

Fraction III is applied to a phenyl separose column that has beenequilibrated in 50 mM Tris-Cl, pH 7.5, 0.3M ammonium sulfate, 10 mMEDTA, and 1 mM DTT. The column is washed with 2 to 4 column volumes ofthe same buffer (A₂₈₀ to baseline), and then 1 to 2 column volumes of TEbuffer containing 100 mM KCl to remove most contaminating E. coliproteins. Tma DNA polymerase is then eluted from the column with buffercontaining 50 mM Tris-Cl, pH 7.5, 2M urea, 20% (w/v) ethylene glycol, 10mM EDTA, and 1 mM DTT, and fractions containing DNA polymerase activityare pooled (Fraction W).

Final purification of recombinant Tma DNA polymerase is achieved usingheparin sepharose chromatography (as for native or MET284 recombinantDNA polymerase), anion exchange chromatography, or affigel bluechromatography. Recombinant Tma DNA polymerase may be diafiltered into2.5X storage buffer, combined with 1.5 volumes of sterile 80% (w/v)glycerol, and stored at -20° C.

EXAMPLE 6 Expression of a Truncated Tma Polymerase MET284

As noted above, expression plasmids containing the complete Tma genecoding sequence expressed either a full length polymerase resulting fromtranslation initiation at the start codon or a shortened polymeraseresulting from translation initiation occurring at the methionine codonat position 140. A third methionine codon that can act as a translationinitiation site occurs at position 284 of the Tma gene coding sequence.Plasmids that express a DNA polymerase lacking amino acids 1 through 283of native Tma DNA polymerase were constructed by introducing deletingcorresponding regions of the Tma coding sequence.

Plasmid pTma12-1 was digested with BspHI (nucleotide position 848) andHiindIII (nucleotide position 2629). A 1781 base pair fragment wasisolated by agarose gel purification. To separate the agarose from theDNA, a gel slice containing the desired fragment was frozen at -20° C.in a Costar spinex filter unit. After thawing at room temperature, theunit was spun in a microfuge. The filtrate containing the DNA wasconcentrated in a Speed Vac concentrator, and the DNA was precipitatedwith ethanol.

The isolated fragment was cloned into plasmid pTma12-1 digested withNcoI and HindlII. Because NcoI digestion leaves the same cohesive endsequence as digestion with BspH1, the 1781 base pair fragment has thesame cohesive ends as the full length fragment excised from plasmidpTma12-1 by digestion with NcoI and HindIII. The ligation of theisolated fragment with the digested plasmid results in a fragment switchand was used to create a plasmid designated pTma14.

Plasmid pTmal 5 was similarly constructed by cloning the same isolatedfragment into pTma13. As with pTma14, pTma15 drives expression of apolymerase that lacks amino acids 1 through 283 of native Tma DNApolymerase; translation initiates at the methionine codon at position284 of the native coding sequence.

Both the pTma14 and pTma15 expression plasmids expressed at a high levela biologically active thermostable DNA polymerase of molecular weight ofabout 70 kDa; plasmid pTma15 expressed polymerase at a higher level thandid pTma14. Based on similarities with E. coli Pol I Klenow fragment,such as conservation of amino acid sequence motifs in all three domainsthat are critical for 3'-5' exonuclease activity, distance from theamino terminus to the first domain critical for exonuclease activity,and length of the expressed protein, the shortened form (MET284) of Tmapolymerase should possess 3'-5' exonuclease and proof-reading activitybut lack 5'-3' exonuclease activity. However, initial SDS activity gelassays and solution assays for 3'-5' exonuclease activity suggestedsignificant attenuation in the proof-reading activity of the polymeraseexpressed by E. coli host cells harboring plasmid pTma15.

MET284 Tma DNA Polymerase was purified from E. coli strain DG 116containing plasmid pTma15. The seed flask for a 10 L fermentationcontained tryprone (20 g/l), yeast extract (10 g/l), NaCl (10 g/l),glucose (10 g/l), ampicillin (50 mg/l), and thiamine (10 mg/l). The seedflask was innoculated with a colony from an agar plate (a frozenglycerol culture can be used). The seed flask was grown at 30° C. tobetween 0.5 to 2.00.D. (A₆₈₀). The volume of seed culture inoculatedinto the fermentor is calculated such that the bacterial concentrationis 0.5 mg dry weight/liter. The 10 liter growth medium contained 25 mMKH₂ PO₄, 10 mM (NH₄)₂ SO₄, 4 mM sodium citrate, 0.4 mM FeCl₃, 0.04 mMZnCl₂, 0.03 mM CoCl₂, 0.03.mM CuCl₂, and 0.03 mM H₃ BO₃. The followingsterile components were added: 4 mM MgSO₄, 20 g/l glucose, 20 mg/lthiamine, and 50 mg/l ampicillin. The pH was adjusted to 6.8 with NaOHand controlled during the fermentation by added NH₄ OH. Glucose wascontinually added by coupling to NH₄ OH addition. Foaming was controlledby the addition of propylene glycol as necessary, as an antifoamingagent. Dissolved oxygen concentration was maintained at 40%.

The fermentor was inoculated as described above, and the culture wasgrown at 30° C. to a cell density of 0.5 to 1.0×10¹⁰ cells/ml (opticaldensity [A₆₈₀ ] of 15). The growth temperature was shifted to 38° C. toinduce the synthesis of MET284 Tma DNA polymerase. The temperature shiftincreases the copy number of the pTma15 plasmid and simultaneouslyderepresses the lambda P_(L) promoter controlling transcription of themodified Tma DNA polymerase gene through inactivation of thetemperature-sensitive cI repressor encoded by the defective prophagelysogen in the host.

The cells were grown for 6 hours to an optical density of 37 (A₆₈₀) andharvested by centrifugation. The cell mass (ca. 95 g/l) was resuspendedin an equivalent volume of buffer containing 50 mM Tris-Cl, pH 7.6, 20mM EDTA and 20% (w/v) μlycerol. The suspension was slowly dripped intoliquid nitrogen to freeze the suspension as "beads" or small pellets.The frozen cells were stored at -70° C.

To 200 g of frozen beads (containing 100 g wet weight cell) were added100 ml of 1X TE (50 mM Tris-Cl, pH 7.5, 10 mM EDTA) and DTT to 0.3 mM,PMSF to 2.4 mM, leupeptin to 1 μg/ml and TLCK (a protease inhibitor) to0.2 mM. The sample was thawed on ice and uniformly resuspended in ablender at low speed. The cell suspension was lysed in an Aminco frenchpressure cell at 20,000 psi. To reduce viscosity, the lysed cell samplewas sonicated 4 times for 3 min. each at 50% duty cycle and 70% output.The sonicate was adjusted to 550 ml with 1X TE containing 1 mM DTT, 2.4mM PMSF, 1 μg/ml leupeptin and 0.2 mM TLCK (Fraction I). After additionof ammonium sulfate to 0.3M, the crude lysate was rapidly brought to 75°C. in a boiling water bath and transferred to a 75° C. water bath for 15min. to denature and inactivate E. coli host proteins. The heat-treatedsample was chilled rapidly to 0° C. and incubated on ice for 20 min.Precipitated proteins and cell membranes were removed by centrifugationat 20,000 X G for 30 min. at 5° C. and the supernatant (Fraction II)saved.

The heat-treated supernatant (Fraction II) was treated withpolyethyleneimine (PEI) to remove most of the DNA and RNA. Polymin P(34.96 ml of 10% [w/v], pH 7.5) was slowly added to 437 ml of FractionII at 0° C. while stirring rapidly. After 30 min. at 0° C., the samplewas centrifuged at 20,000 X G for 30 min. The supernatant (Fraction III)was applied at 80 ml/hr to a 100 ml phenylseparose column (3.2×12.5 cm)that had been equilibrated in 50 mM Tris-Cl, pH 7.5, 0.3M ammoniumsulfate, 10 mM EDTA, and 1 mM DTT. The column was washed with about 200ml of the same buffer (A₂₈₀ to baseline) and then with 150 ml of 50 mMTris-Cl, pH 7.5, 100 mM KCl, 10 mM EDTA and 1 mM DTr. The MET284 Tma DNApolymerase was then eluted from the column with buffer containing 50 mMTris-Cl, pH 7.5, 2M urea, 20% (w/v) ethylene glycol, 10 mM EDTA, and 1mM DTT, and fractions containing DNA polymerase activity were pooled(Fraction IV).

Fraction IV is adjusted to a conductivity equivalent to 50 mM KCl in 50mM Tris-Cl, pH 7.5, 1 mM EDTA, and 1 mM DTT. The sample was applied (at9 ml/hr) to a 15 ml heparin-sepharose column that had been equilibratedin the same buffer. The column was washed with the same buffer at ca. 14ml/hr (3.5 column volumes) and eluted with a 150 ml 0.05 to 0.5M KClgradient in the same buffer. The DNA polymerase activity eluted between0.11-0.22 M KCl. Fractions containing the pTma15 encoded modifed Tma DNApolymerase are pooled, concentrated, and diafiltered against 2.5Xstorage buffer (50 mM Tris-Cl, pH 8.0, 250 mM KCl, 0.25 mM EDTA, 2.5 mMDTT, and 0.5% Tween 20), subsequently mixed with 1.5 volumes of sterile80% (w/v) glycerol, and stored at -20° C. Optionally, the heparinsepharose-eluted DNA polymerase or the phenyl sepharose-eluted DNApolymerase can be dialyzed or adjusted to a conductivity equivalent to50 mM KCl in 50 mM Tris-Cl, pH 7.5, 1 mM DTT, 1 mM EDTA, and 0.2% Tween20 and applied (1 mg protein/ml resin) to an affigel blue column thathas been equilibrated in the same buffer. The column is washed withthree to five column volumes of the same buffer and eluted with a 10column volume KCl gradient (0.05 to 0.8M) in the same buffer. Fractionscontaining DNA polymerase activity (eluting between 0.25 and 0.4M KCl)are pooled, concentrated, diafiltered, and stored as above.

The relative thermoresistance of various DNA polymerases has beencompared. At 97.5° C. the half-life of native Tma DNA polymerase is morethan twice the half-life of either native or recombinant Taq DNA (i.e.,AmpliTaq®) DNA polymerase. Surprisingly, the half-life at 97.5° C. ofMET284 Tma DNA polymerase is 2.5 to 3 times longer than the half-life ofnative Tma DNA polymerase.

PCR robes containing 10 mM Tris-Cl, pH 8.3, and 1.5 mM MgCl₂ (for Taq ornative Tma DNA polymerase) or 3 mM MgCl₂ (for MET284 Tma DNApolymerase), 50 mM KCl (for Taq, native Tma and MET284 Tma DNApolymerases) or no KCl (for MET284 Tma DNA polymerase), 0.5 μM each ofprimers PCR01 and PCR02, 1 ng of lambda template DNA, 200 μM of eachdNTP except dCTP, and 4 units of each enzyme were incubated at 97.5° C.in a large water bath for times ranging from 0 to 60 min. Samples werewithdrawn with time, stored at 0° C., and 5 μl assayed at 75° C. for 10min. in a standard activity assay for residual activity.

Taq DNA polymerase had a half-life of about 10 min. at 97.5° C., whilenative Tma DNA polymerase had a half-life of about 21 to 22 min. at97.5° C. Surprisingly, the MET284 form of Tma DNA polymerase had asignificantly longer half-life (50 to 55 min.) than either Taq or nativeTma DNA polymerase. The improved thermoresistance of MET284 Tma DNApolymerase will find applications in PCR, particularly where G+C-richtargets are difficult to amplify because the strand-separationtemperature required for complete denaturation of target and PCR productsequences leads to enzyme inactivation.

PCR tubes containing 50 μl of 10 mM Tris-Cl, pH 8.3, 3 mM MgCl₂, 200 μMof each dNTP, 0.5 ng bacteriophage lambda DNA, 0.5 μM of primer PCR01, 4units of MET284 Tma DNA polymerase, and 0.5 μM of primer PCR02 or PL10were cycled for 25 cycles using T_(den) of 96° C. for 1 min. andT_(anneal-extend) of 60° C. for 2 min. Lambda DNA template,deoxynucleotide stock solutions, and primers PCR01 and PCR02 were pan ofthe PECI GeneAmp® kit. Primer PL10 has the sequence: (SEQ ID NO. 45)5'-GGCGTACCTTTGTCTCACGGGCAAC-3' and is complementary to bacteriophagelambda nucleotides 8106-8130.

The primers PCR01 and PCR02 amplify a 500 bp product from lambda. Theprimer pair PCR01 and PL 10 amplify a 1 kb product from lambda. Afteramplification with the respective primer sets, 5 μl aliquots weresubjected to agarose gel electrophoresis and the specific intendedproduct bands visualized with ethidium bromide staining. Abundant levelsof product were generated with both primer sets, showing that MET284 TmaDNA polymerase successfully amplified the intended target sequence.

EXAMPLE 7 Expression of Truncated Tma Polymerase

As noted above, host ells transformed with plasmids that contain thecomplete Tma DNA polymerase gene coding sequence express a shortenedform (MET140) of Tma polymerase either exclusively or along with thefull length polymerase. Mutations can be made to control which form ofthe polymerase is expressed. To enhance the exclusive expression of theMET140 form of the polymerase, the coding region corresponding to aminoacids through 139 were deleted from the expression vector. The protocolfor constructing such a deletion is similar to the constructiondescribed in Example 6: a shortened gene fragment is excised and thenreinserted into a vector from which a full length fragment has beenexcised. However, the shortened fragment can be obtained as a PCRamplification product rather than purified from a restriction digest.This methodology allows a new upstream restriction site (or othersequences) to be incorporated where useful.

To delete the region up to the methionine codon at position 140, an SphIsite was introduced into pTma12-1 and pTma 1 3 using PCR. A forwardprimer (FL63) was designed to introduce the SphI site just upstream ofthe methionine codon at position 140. The reverse primer (FL69) waschosen to include an XbaI at position 624. Plasmid pTma12-1 linearizedwith Sma. I was used as the PCR template, yielding a 225 bp PCR product.

Before digestion, the PCR product was treated with 50 μg/ml ofProteinase K in PCR reaction mix plus 0.5% SDS and 5 mM EDTA. Afterincubating for 30 minutes at 37° C., the Proteinase K was heatinactivated at 68° C. for 10 minutes. This procedure eliminated any Taqpolymerase bound to the product that could inhibit subsequentrestriction digests. The buffer was changed to a TE buffer, and theexcess PCR primers were removed with a Centricon 100 microconcentrator.

The amplified fragment was digested with SphI, then treated with Klenowto create a blunt end at the SphI-cleaved end, and finally digested withXbaI. The resulting fragment was ligated with plasmid pTma13 (pTma12-1would have been suitable) that had been digested with NcoI, repairedwith Klenow, and then digested with XbaI. The ligation yielded anin-frame coding sequence with the region between the initial NcoI site(upstream of the first methionine codon of the coding sequence) and theintroduced SphI site (upstream of the methionine codon at position 140)deleted. The resulting expression vector was designated pTmal 6.

The primers used in this example are given below and in the SequenceListing section.

    __________________________________________________________________________    Primer                                                                            SEQ ID NO:                                                                             Sequence                                                         __________________________________________________________________________    FL63                                                                              SEQ ID NO: 30                                                                          5'GATAAAGGCATGCTTCAGCTTGTGAACG                                   FL69                                                                              SEQ ID NO: 31                                                                          5'TGTACTTCTCTAGAAGCTGAACAGCAG                                    __________________________________________________________________________

EXAMPLE 8 Elimination of Undesired RBS in MET140 Expression Vectors

Reduced expression of the MET140 form of Tma DNA polymerase can beachieved by eliminating the ribosome binding site (RBS) upstream of themethionine codon at position 140. The RBS was be eliminated viaoligonucleotide site-directed mutagenesis without changing the aminoacid sequence. Taking advantage of the redundancy of the genetic code,one can make changes in the third position of codons to alter thenucleic acid sequence, thereby eliminating the RBS, without changing theamino acid sequence of the encoded protein.

A mutagenic primer (FL64) containing the modified sequence wassynthesized and phosphorylated. Single-stranded pTma09 (a full lengthclone having an NcoI site) was prepared by coinfecting with the helperphage R408, commercially available from Stratagene. A "gapped duplex" ofsingle stranded pTma09 and the large fragment from the PvuII digestionof pBS 13+ was created by mixing the two plasmids, heating to boilingfor 2 minutes, and cooling to 65° C. for 5 minutes. The phosphorylatedprimer was then annealed with the "gapped duplex" by mixing, heating to80° C. for 2 minutes, and then cooling slowly to room temperature. Theremaining gaps were filled by extension with Klenow and the fragmentsligated with T4 DNA ligase, both reactions taking place in 200 μM ofeach dNTP and 40 μM ATP in standard salts at 37° C. for 30 minutes.

The resulting circular fragment was transformed into DG 101 host cellsby plate transformations on nitrocellulose filters. Duplicate filterswere made and the presence of the correct plasmid was detected byprobing with a γ³² P-phosphorylated probe (FL65). The vector thatresulted was designated pTma19.

The RBS minus portion from pTma19 was cloned into pTma12-1 via anNcoI/XbaI fragment switch. Plasmid pTma19 was digested with NcoI andXbaI, and the 620 bp fragment was purified by gel electrophoresis, as inExample 7, above. Plasmid pTma12-1 was digested with NcoI, XbaI, andXcmI. The XcmI cleavage inactivates the RBS+ fragment for the subsequentligation step, which is done under conditions suitable for ligating"sticker" ends (dilute ligase and 40 μM ATP). Finally, the ligationproduct is transformed into DG 116 host cells for expression anddesignated pTma19-RBS.

The oligonucleotide sequences used in this example are listed below andin the Sequence Listing section.

    __________________________________________________________________________    Oligo                                                                             SEQ ID NO:                                                                             Sequence                                                         __________________________________________________________________________    FL64                                                                              SEQ ID NO: 32                                                                          5'CTGAAGCATGTCTTTGTCACCGGTTACTATGAATAT                           FL65                                                                              SEQ ID NO: 33                                                                          5'TAGTAACCGGTGACAAAG                                             __________________________________________________________________________

EXAMPLE 9 Expression of Truncated Tma DNA Polymerase MET ASP21

To effect translation initiation at about the aspartic acid codon atposition 21 of the Tma DNA polymerase gene coding sequence, a methioninecodon is introduced before the codon, and the region from the initialNcoI site to this introduced methionine codon is deleted. The deletionprocess involves PCR with the same downstream primer described above(FL69) and with an upstream primer (FL66) designed to incorporate anNcoI site and a methionine codon to yield a 570 base pair product.

The amplified product is concentrated with a Centricon-100microconcentrator to eliminate excess primers and buffer. The product isconcentrated in a Speed Vac concentrator and then resuspended in thedigestion mix. The amplified product is digested with NcoI and XbaI.Likewise, pTma12-1, pTma13, or pTma19-RBS is digested with the same tworestriction enzymes, and the digested, amplified fragment is ligatedwith the digested expression vector. The resulting construct has adeletion from the NcoI site upstream of the start codon of the nativeTma coding sequence to the new methionine codon introduced upstream ofthe aspartic acid codon at position 21 of the native Tma codingsequence.

Similarly, a deletion mutant can be created such that translationinitiation begins at Glu74, the glutamic acid codon at position 74 ofthe native Tma coding sequence. An upstream primer (FL67) is designed tointroduce a methionine codon and an NcoI site before Glu74. Thedownstream primer and cloning protocol used are as described above forthe MET-ASP21 construct.

The upstream primer sequences used in this example are listed below andin the Sequence Listing section.

    __________________________________________________________________________    Oligo                                                                             SEQ ID NO:                                                                             Sequence                                                         __________________________________________________________________________    FL66                                                                              SEQ ID NO: 34                                                                          5'CTATGCCATGGATAGATCGCTTTCTACTTCC                                FL67                                                                              SEQ ID NO: 35                                                                          5'CAAGCCCATGGAAACTTACAAGGCTCAAAGA                                __________________________________________________________________________

EXAMPLE 10 Expression Vectors With T7 Promoters

Expression efficiency can be altered by changing the promoter and/orribosomal binding site (RBS) in an expression vector. The T7 Gene10promoter and RBS have been used to drive expression of Tma DNApolymerase from expression vector pTma17, and the T7 Gene10 promoter andthe Gene N RBS have been used to drive expression of Tma DNA polymerasefrom expression vector pTmal 8. The construction of these vectors tookadvantage of unique restriction sites present in pTma12-1: an AflII siteupstream of the promoter, an NcoI site downstream of the RBS, and aBspEI site between the promoter and the RBS. The existing promoter wasexcised from pTma12-1 and replaced with a synthetic T7 Gene 10 promoterusing techniques similar to those described in the previous examples.

The synthetic insert was created from two overlapping syntheticoligonucleofides. To create pTmal 7 (with T7 Gene 10 RBS), equalportions of FR414 and FR416 were mixed, heated to boiling, and cooledslowly to room temperature. The hybridized oligonucleofides wereextended with Klenow to create a full length double stranded insert. Theextended fragment was then digested with AflII and NcoI, leaving theappropriate "sticky" ends. The insert was cloned into plasmid pTma12-1digested with AflII and NcoI. DG 116 host cells were transformed withthe resulting plasmid and transformants screened for the desiredplasmid.

The same procedure was used in the creation of pTma18 (with Gene N RBS),except that FR414 and FR418 were used, and the extended fragment wasdigested with AflII and BspEI. This DNA fragment was substituted for thePL promoter in plasmid pTma12-1 that had been digested with AplII andBspEI.

Plasmids pTma17 and pTma18 are used to transform E. coli host cells thathave been modified to contain an inducible T7 DNA polymerase gene.

The oligonucleotides used in the construction of these vectors arelisted below and in the Sequence Listing section.

    __________________________________________________________________________    FR414                                                                             SEQ ID NO: 36                                                             5'TCAGCTTAAGACTTCGAAATTAATACGACTCACTATAGGGAGACCACAACGGTTT-                    CCCTC                                                                         FR416                                                                             SEQ ID NO: 37                                                             5'TCGACCATGGGTATATCCTTCTTAAAGTTAAACAAAATTATTCTAGAGGGAAACC-                    GTTG                                                                          FR418                                                                             SEQ ID NO: 38                                                             5'TCAGTCCGGATAAACAAAATTATTTCTAGAGGGAAACCGTTG                                  __________________________________________________________________________

EXAMPLE 11 Translational Coupling

As described above, translational coupling can increase the efficiencyof expression of a protein by coupling a short coding sequence justupstream of the initiation site of the coding sequence for the protein.Termination of translation of the upstream coding sequence leaves theribosome in close proximity to the initiation site for the downstreamcoding sequence. The upstream coding sequence functions only to move theribosome downstream to the start of the coding sequence for the desiredprotein.

Translationally coupled Tma expression vectors were constructed with thetranslation initiation signal and first ten codons of the T7bacteriophage major capsid protein (gene 10) fused in-frame to the lastsix codons of E. coli TrpE placed upstream of the Tma coding region. TheTGA (stop) codon for TrpE is "coupled" with the ATG (start) codon forthe Tma gene, forming the sequence TGATG. A one base frame-shift isrequired between translation of the short coding sequence andtranslation of the Tma coding sequence.

In the example below, a fragment containing the T7 Gene 10-E. coliTrpE/TrpD fusion product (the last 6 codons and TGA stop codon from TrpEalong with the overlapping ATG start codon from TrpD) was transferredfrom a pre-existing plasmid. One of ordinary skill will recognize thatthe T7 Gene 10-E. coli TrpE/TrpD fusion product used in the constructionof the translationally coupled expression vectors can be constructed asa synthetic oligonucleotide. The sequence for the inserted fragment islisted below and in the Sequence Listing section.

The T7 Gene 10-E. coli TrpE/TrpD fusion product was amplified fromplasmid pSYC1868 with primers FLA8 and FL49. With primers FL51 and FL53,the 5' end of the Tma Pol I gene in pTma08 (a full length clonecontaining an NdeI site) was amplified from the ATG start codon to theMroI site downstream of the ATG start codon. The primers FL51 and FL49were designed to leave overlapping regions such that the two amplifiedproducts could be annealed and extended, essentially as described inExample 10. The two amplification products were mixed, heated to 95° C.,slowly cooled to room temperature to anneal, and extended with Taqpolymerase.

The extended insert was amplified with primers FLA8 and FL53 and thendigested with XmaI and MroI. Plasmid pTma12-1 was digested with MroI andtreated with calf intestine alkaline phosphatase to prevent re-ligation.The digested pTma12-1 was ligated with the insert. DG 116 host cellswere transformed with the resulting construct and transformants screenedfor the desired plasmid DNA. The resulting vector was designated pTma20.

The sequences of the oligonucleotide primers and the T7 Gene 10-E. coliTrpE/TrpD fusion product (Gene 10 insert) are listed below and in theSequence Listing section.

    __________________________________________________________________________    Primers SEQ ID NO:                                                                             Sequence                                                     __________________________________________________________________________    FL48    SEQ ID NO: 39                                                                          5'TCCGGACTTTAAGAAGGAGATATAC                                  FL49    SEQ ID NO: 40                                                                          5'AATAGTCTAGCCATCAGAAAGTCTCCTGTGC                            FL51    SEQ ID NO: 41                                                                          5'AGACTTTCTGATGGCTFAGACTATTTCTT                              FL53    SEQ ID NO: 42                                                                          5'CTGAATCAGGAGACCCGGGGTCTTTGGTC                              Gene 10 insert                                                                        SEQ ID NO:                                                                             5'CTTTAAGAAGGAGATATACATATGGCTAGCATGACTGGTGGACAGCAAATG                         CATGCACAGGAGACTTTCTGATG                                      __________________________________________________________________________

EXAMPLE 12 Arg U tRNA Expression

The pattern of codon usage differs between Thermotoga maritima and E.coli. In the Tma coding sequence, arginine is most frequently coded forby the "AGA" codon, whereas this codon is used in low frequency in E.coli host cells. The corresponding "Arg U" tRNA appears in lowconcentrations in E. coli. The low concentration in the host cell of ArgtRNA using the "AGA" codon may limit the translation efficiency of theTma polymerase gene. The efficiency of translation of the Tma codingsequence within an E. coli host may be improved by increasing theconcentration of this tRNA species by cloning multiple copies of thetRNA gene into the host cell using a second expression vector thatcontains the gene for the "Arg U" tRNA.

The Arg U tRNA gene was PCR amplified from E. coli genomic DNA using theprimers DG284 and DG285. The amplification product was digested withSalI and BamHI. The ColEI compatible vector pACYC184 was digested withSalI and BamHI, and the Arg U gene fragment was subsequently ligatedwith the digested vector. DG101 cells were transformed, and the ligatedvector was designated pARC01. Finally, DG 116 host cells wereco-transformed with pARC01 and pTma12-1.

The oligonucleotide primers used in this Example are listed below and inthe Sequence Listing section.

    __________________________________________________________________________    Primers                                                                            SEQ ID NO:                                                                             Sequence                                                        __________________________________________________________________________    DG284                                                                              SEQ ID NO: 43                                                                          5'CGGGGATCCAAAAGCCATTGACTCAGCAAGG                               DG285                                                                              SEQ ID NO: 44                                                                          5'GGGGGTCGACGCATGCGAGGAAAATAGACG                                __________________________________________________________________________

    __________________________________________________________________________    SEQUENCE LISTING                                                              (1) GENERAL INFORMATION:                                                      (iii) NUMBER OF SEQUENCES: 46                                                 (2) INFORMATION FOR SEQ ID NO:1:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 2682 base pairs                                                   (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: DNA (genomic)                                             (ix) FEATURE:                                                                 (A ) NAME/KEY: CDS                                                            (B) LOCATION: 1..2682                                                         (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1:                                       ATGGCGAGACTATTTCTCTTTGATGGAACTGCTCTGGCCTACAGAGCG48                            MetAlaArgLeuPheLeuPheAspGlyThrAlaLeuAlaTyrArgAla                              15 1015                                                                       TACTATGCGCTCGATAGATCGCTTTCTACTTCCACCGGCATTCCCACA96                            TyrTyrAlaLeuAspArgSerLeuSerThrSerThrGlyIleProThr                              20 2530                                                                       AACGCCACATACGGTGTGGCGAGGATGCTGGTGAGATTCATCAAAGAC144                           AsnAlaThrTyrGlyValAlaArgMetLeuValArgPheIleLysAsp                              35 4045                                                                       CATATCATTGTCGGAAAAGACTACGTTGCTGTGGCTTTCGACAAAAAA192                           HisIleIleValGlyLysAspTyrValAlaValAlaPheAspLysLys                              5055 60                                                                       GCTGCCACCTTCAGACACAAGCTCCTCGAGACTTACAAGGCTCAAAGA240                           AlaAlaThrPheArgHisLysLeuLeuGluThrTyrLysAlaGlnArg                              65707 580                                                                     CCAAAGACTCCGGATCTCCTGATTCAGCAGCTTCCGTACATAAAGAAG288                           ProLysThrProAspLeuLeuIleGlnGlnLeuProTyrIleLysLys                              85 9095                                                                       CTGGTCGAAGCCCTTGGAATGAAAGTGCTGGAGGTAGAAGGATACGAA336                           LeuValGluAlaLeuGlyMetLysValLeuGluValGluGlyTyrGlu                              100105 110                                                                    GCGGACGATATAATTGCCACTCTGGCTGTGAAGGGGCTTCCGCTTTTT384                           AlaAspAspIleIleAlaThrLeuAlaValLysGlyLeuProLeuPhe                              115120 125                                                                    GATGAAATATTCATAGTGACCGGAGATAAAGACATGCTTCAGCTTGTG432                           AspGluIlePheIleValThrGlyAspLysAspMetLeuGlnLeuVal                              13013514 0                                                                    AACGAAAAGATCAAGGTGTGGCGAATCGTAAAAGGGATATCCGATCTG480                           AsnGluLysIleLysValTrpArgIleValLysGlyIleSerAspLeu                              145150155 160                                                                 GAACTTTACGATGCGCAGAAGGTGAAGGAAAAATACGGTGTTGAACCC528                           GluLeuTyrAspAlaGlnLysValLysGluLysTyrGlyValGluPro                              165170 175                                                                    CAGCAGATCCCGGATCTTCTGGCTCTAACCGGAGATGAAATAGACAAC576                           GlnGlnIleProAspLeuLeuAlaLeuThrGlyAspGluIleAspAsn                              180185 190                                                                    ATCCCCGGTGTAACTGGGATAGGTGAAAAGACTGCTGTTCAGCTTCTA624                           IleProGlyValThrGlyIleGlyGluLysThrAlaValGlnLeuLeu                              195200205                                                                      GAGAAGTACAAAGACCTCGAAGACATACTGAATCATGTTCGCGAACTT672                          GluLysTyrLysAspLeuGluAspIleLeuAsnHisValArgGluLeu                              210215220                                                                     CCTCAAAAG GTGAGAAAAGCCCTGCTTCGAGACAGAGAAAACGCCATT720                          ProGlnLysValArgLysAlaLeuLeuArgAspArgGluAsnAlaIle                              225230235240                                                                  CTCAG CAAAAAGCTGGCGATTCTGGAAACAAACGTTCCCATTGAAATA768                          LeuSerLysLysLeuAlaIleLeuGluThrAsnValProIleGluIle                              245250255                                                                     AACT GGGAAGAACTTCGCTACCAGGGCTACGACAGAGAGAAACTCTTA816                          AsnTrpGluGluLeuArgTyrGlnGlyTyrAspArgGluLysLeuLeu                              260265270                                                                     CCACTT TTGAAAGAACTGGAATTCGCATCCATCATGAAGGAACTTCAA864                          ProLeuLeuLysGluLeuGluPheAlaSerIleMetLysGluLeuGln                              275280285                                                                     CTGTACGAAGAG TCCGAACCCGTTGGATACAGAATAGTGAAAGACCTA912                          LeuTyrGluGluSerGluProValGlyTyrArgIleValLysAspLeu                              290295300                                                                     GTGGAATTTGAAAAACTCAT AGAGAAACTGAGAGAATCCCCTTCGTTC960                          ValGluPheGluLysLeuIleGluLysLeuArgGluSerProSerPhe                              305310315320                                                                  GCCATAGATCTTGAGA CGTCTTCCCTCGATCCTTTCGACTGCGACATT1008                         AlaIleAspLeuGluThrSerSerLeuAspProPheAspCysAspIle                              325330335                                                                     GTCGGTATCTCTGTG TCTTTCAAACCAAAGGAAGCGTACTACATACCA1056                         ValGlyIleSerValSerPheLysProLysGluAlaTyrTyrIlePro                              340345350                                                                     CTCCATCATAGAAACGCC CAGAACCTGGACGAAAAAGAGGTTCTGAAA1104                         LeuHisHisArgAsnAlaGlnAsnLeuAspGluLysGluValLeuLys                              355360365                                                                     AAGCTCAAAGAAATTCTGGAGGA CCCCGGAGCAAAGATCGTTGGTCAG1152                         LysLeuLysGluIleLeuGluAspProGlyAlaLysIleValGlyGln                              370375380                                                                     AATTTGAAATTCGATTACAAGGTGTTGATGG TGAAGGGTGTTGAACCT1200                         AsnLeuLysPheAspTyrLysValLeuMetValLysGlyValGluPro                              385390395400                                                                  GTTCCTCCTTACTTCGACACGATGATA GCGGCTTACCTTCTTGAGCCG1248                         ValProProTyrPheAspThrMetIleAlaAlaTyrLeuLeuGluPro                              405410415                                                                     AACGAAAAGAAGTTCAATCTGGACGAT CTCGCATTGAAATTTCTTGGA1296                         AsnGluLysLysPheAsnLeuAspAspLeuAlaLeuLysPheLeuGly                              420425430                                                                     TACAAAATGACATCTTACCAAGAGCTCAT GTCCTTCTCTTTTCCGCTG1344                         TyrLysMetThrSerTyrGlnGluLeuMetSerPheSerPheProLeu                              435440445                                                                     TTTGGTTTCAGTTTTGCCGATGTTCCTGTAGAAA AAGCAGCGAACTAC1392                         PheGlyPheSerPheAlaAspValProValGluLysAlaAlaAsnTyr                              450455460                                                                     TCCTGTGAAGATGCAGACATCACCTACAGACTTTACAAGACC CTGAGC1440                         SerCysGluAspAlaAspIleThrTyrArgLeuTyrLysThrLeuSer                              465470475480                                                                  TTAAAACTCCACGAGGCAGATCTGGAAAACGTGTTCTAC AAGATAGAA1488                         LeuLysLeuHisGluAlaAspLeuGluAsnValPheTyrLysIleGlu                              485490495                                                                     ATGCCCCTTGTGAACGTGCTTGCACGGATGGAACTGAA CGGTGTGTAT1536                         MetProLeuValAsnValLeuAlaArgMetGluLeuAsnGlyValTyr                              500505510                                                                     GTGGACACAGAGTTCCTGAAGAAACTCTCAGAAGAGTACG GAAAAAAA1584                         ValAspThrGluPheLeuLysLysLeuSerGluGluTyrGlyLysLys                              515520525                                                                     CTCGAAGAACTGGCAGAGGAAATATACAGGATAGCTGGAGAGCCG TTC1632                         LeuGluGluLeuAlaGluGluIleTyrArgIleAlaGlyGluProPhe                              530535540                                                                     AACATAAACTCACCGAAGCAGGTTTCAAGGATCCTTTTTGAAAAACTC168 0                         AsnIleAsnSerProLysGlnValSerArgIleLeuPheGluLysLeu                              545550555560                                                                  GGCATAAAACCACGTGGTAAAACGACGAAAACGGGAGACTATTCAACA 1728                         GlyIleLysProArgGlyLysThrThrLysThrGlyAspTyrSerThr                              565570575                                                                     CGCATAGAAGTCCTCGAGGAACTTGCCGGTGAACACGAAATCATTCCT 1776                         ArgIleGluValLeuGluGluLeuAlaGlyGluHisGluIleIlePro                              580585590                                                                     CTGATTCTTGAATACAGAAAGATACAGAAATTGAAATCAACCTACATA 1824                         LeuIleLeuGluTyrArgLysIleGlnLysLeuLysSerThrTyrIle                              595600605                                                                     GACGCTCTTCCCAAGATGGTCAACCCAAAGACCGGAAGGATTCATGCT1872                          A spAlaLeuProLysMetValAsnProLysThrGlyArgIleHisAla                             610615620                                                                     TCTTTCAATCAAACGGGGACTGCCACTGGAAGACTTAGCAGCAGCGAT1920                          SerPheAsn GlnThrGlyThrAlaThrGlyArgLeuSerSerSerAsp                             625630635640                                                                  CCCAATCTTCAGAACCTCCCGACGAAAAGTGAAGAGGGAAAAGAAATC1968                          ProAsn LeuGlnAsnLeuProThrLysSerGluGluGlyLysGluIle                             645650655                                                                     AGGAAAGCGATAGTTCCTCAGGATCCAAACTGGTGGATCGTCAGTGCC2016                          ArgLy sAlaIleValProGlnAspProAsnTrpTrpIleValSerAla                             660665670                                                                     GACTACTCCCAAATAGAACTGAGGATCCTCGCCCATCTCAGTGGTGAT2064                          AspTyrS erGlnIleGluLeuArgIleLeuAlaHisLeuSerGlyAsp                             675680685                                                                     GAGAATCTTTTGAGGGCATTCGAAGAGGGCATCGACGTCCACACTCTA2112                          GluAsnLeuLeu ArgAlaPheGluGluGlyIleAspValHisThrLeu                             690695700                                                                     ACAGCTTCCAGAATATTCAACGTGAAACCCGAAGAAGTAACCGAAGAA2160                          ThrAlaSerArgIlePheAsn ValLysProGluGluValThrGluGlu                             705710715720                                                                  ATGCGCCGCGCTGGTAAAATGGTTAATTTTTCCATCATATACGGTGTA2208                          MetArgArgAlaGlyLy sMetValAsnPheSerIleIleTyrGlyVal                             725730735                                                                     ACACCTTACGGTCTGTCTGTGAGGCTTGGAGTACCTGTGAAAGAAGCA2256                          ThrProTyrGlyLeuS erValArgLeuGlyValProValLysGluAla                             740745750                                                                     GAAAAGATGATCGTCAACTACTTCGTCCTCTACCCAAAGGTGCGCGAT2304                          GluLysMetIleValAsn TyrPheValLeuTyrProLysValArgAsp                             755760765                                                                     TACATTCAGAGGGTCGTATCGGAAGCGAAAGAAAAAGGCTATGTTAGA2352                          TyrIleGlnArgValValSerGlu AlaLysGluLysGlyTyrValArg                             770775780                                                                     ACGCTGTTTGGAAGAAAAAGAGACATACCACAGCTCATGGCCCGGGAC2400                          ThrLeuPheGlyArgLysArgAspIleProGl nLeuMetAlaArgAsp                             785790795800                                                                  AGGAACACACAGGCTGAAGGAGAACGAATTGCCATAAACACTCCCATA2448                          ArgAsnThrGlnAlaGluGlyGluArgI leAlaIleAsnThrProIle                             805810815                                                                     CAGGGTACAGCAGCGGATATAATAAAGCTGGCTATGATAGAAATAGAC2496                          GlnGlyThrAlaAlaAspIleIleLys LeuAlaMetIleGluIleAsp                             820825830                                                                     AGGGAACTGAAAGAAAGAAAAATGAGATCGAAGATGATCATACAGGTC2544                          ArgGluLeuLysGluArgLysMetArgSer LysMetIleIleGlnVal                             835840845                                                                     CACGACGAACTGGTTTTTGAAGTGCCCAATGAGGAAAAGGACGCGCTC2592                          HisAspGluLeuValPheGluValProAsnGluGl uLysAspAlaLeu                             850855860                                                                     GTCGAGCTGGTGAAAGACAGAATGACGAATGTGGTAAAGCTTTCAGTG2640                          ValGluLeuValLysAspArgMetThrAsnValValLysLeuS erVal                             865870875880                                                                  CCGCTCGAAGTGGATGTAACCATCGGCAAAACATGGTCGTGA2682                                ProLeuGluValAspValThrIleGlyLysThrTrpSer                                        885890                                                                       (2) INFORMATION FOR SEQ ID NO: 2:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 22 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2:                                      CGAGATCTGGNTAYGTWGAAAC 22                                                     (2) INFORMATION FOR SEQ ID NO: 3:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 22 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3:                                      CGAGATCTGGNTAYGTWGAGAC 22                                                     (2) INFORMATION FOR SEQ ID NO: 4:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 22 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4:                                      CGAGATCTGGNTAYGTSGA AAC22                                                     (2) INFORMATION FOR SEQ ID NO: 5:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 22 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5:                                      CGAGATCTGGNTAYG TSGAGAC22                                                     (2) INFORMATION FOR SEQ ID NO: 6:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 20 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6:                                      CGGAATTCRTC RTGWACCTG20                                                       (2) INFORMATION FOR SEQ ID NO: 7:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 20 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7:                                      CGGAATTC RTCRTGWACTTG                                                         (2) INFORMATION FOR SEQ ID NO: 8:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 20 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8:                                      CGGA ATTCRTCRTGSACCTG20                                                       (2) INFORMATION FOR SEQ ID NO: 9:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 20 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9:                                       CGGAATTCRTCRTGSACTTG20                                                       (2) INFORMATION FOR SEQ ID NO: 10:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 19 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10:                                     CGAGATCTACNGCNACWGG19                                                         (2) INFORMATION FOR SEQ ID NO: 11:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 19 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11:                                     CGAGATCTACNGCNACSGG19                                                         (2) INFORMATION FOR SEQ ID NO: 12:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 21 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        ( xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12:                                    ACAGCAGCKGATATAATAAAG21                                                       (2) INFORMATION FOR SEQ ID NO: 13:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 22 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                         (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13:                                    GCCATGAGCTGTGGTATGTCTC22                                                      (2) INFORMATION FOR SEQ ID NO: 14:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 20 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14:                                     TACGTTCCCGGGCCTTGTAC20                                                        (2) INFORMATION FOR SEQ ID NO: 15:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 19 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15:                                     AGGAGGTGATCCAACCGCA19                                                         (2) INFORMATION FOR SEQ ID NO: 16:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 48 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          ( ii) MOLECULE TYPE: Other Nucleic Acid                                       (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16:                                     CCATCAAAAAGAAATAGTCTAGCCATATGTGTTTCCTGTGTGAAATTG48                            (2) INFORMATION FOR SEQ ID NO: 17:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 18 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: Other Nucleic Acid                                       (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17:                                     AAACACATATGGCTAGAC18                                                          (2) INFORMATION FOR SEQ ID NO: 18:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 48 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18:                                     CCATCAAAAAGAAATAGTCTAGCCATGGTTGTTTCCTGTGTGAAATTG48                            (2) INFORMATION FOR SEQ ID NO: 19:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 18 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      ( D) TOPOLOGY: linear                                                         (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19:                                     AAACAACCATGGCTAGAC18                                                          (2) INFORMATION FOR SEQ ID NO: 20:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 44 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                         (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20:                                     GCAAAACATGGTCGTGATATCGGATCCGGAGGTGTTATCTGTGG44                                (2) INFORMATION FOR SEQ ID NO: 21:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 17 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                         (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21:                                     CCGATATCACGACCATG17                                                           (2) INFORMATION FOR SEQ ID NO: 22:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 28 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                         (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22:                                     CCGGAAGAAGGAGATATACATATGAGCT28                                                (2) INFORMATION FOR SEQ ID NO: 23:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 20 bases                                                          (B) TYPE: nucleic acid                                                        (C ) STRANDEDNESS: single                                                     (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23:                                     CATATGTATATCTCCTTCTT20                                                        (2) INFORMATION FOR SEQ ID NO: 24:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 23 bases                                                          (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                     (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24:                                     CCGGAGGAGAAAACATATGAGCT23                                                     (2) INFORMATION FOR SEQ ID NO: 25:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 15 bases                                                          (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                     (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25:                                     CATATGTTTTCTCCT15                                                             (2) INFORMATION FOR SEQ ID NO: 26:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 34 bases                                                          (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                     (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26:                                     CCGGAAGAAGGAGAAAATACCATGGGCCCGGTAC34                                          (2) INFORMATION FOR SEQ ID NO: 27:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 26 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27:                                     CGGGCCCATGGTATTTTCTCCTTCTT26                                                  (2) INFORMATION FOR SEQ ID NO: 28:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 29 bases                                                           (B) TYPE: nucleic acid                                                       (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28:                                     CCGGAGGAGAAAATCCATGGGCCCGGTAC29                                               (2) INFORMATION FOR SEQ ID NO: 29:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 21 bases                                                           (B) TYPE: nucleic acid                                                       (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29:                                     CGGGCCCATGGATTTTCTCCT21                                                       (2) INFORMATION FOR SEQ ID NO: 30:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 28 bases                                                           (B) TYPE: nucleic acid                                                       (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30:                                     GATAAAGGCATGCTTCAGCTTGTGAACG28                                                (2) INFORMATION FOR SEQ ID NO: 31:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 27 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31:                                     TGTACTTCTCTAGAAGCTGAACAGCAG27                                                 (2) INFORMATION FOR SEQ ID NO: 32:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A ) LENGTH: 36 bases                                                         (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32:                                     CTGAAGCATGTCTTTGTCACCGGTTACTATGAATAT36                                        (2) INFORMATION FOR SEQ ID NO: 33:                                            (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 18 bases                                                         (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33:                                     TAGTAACCGGTGACAAAG18                                                          (2) INFORMATION FOR SEQ ID NO: 34:                                            (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 31 bases                                                         (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34:                                     CTATGCCATGGATAGATCGCTTTCTACTTCC31                                             (2) INFORMATION FOR SEQ ID NO: 35:                                            (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 31 bases                                                         (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35:                                     CAAGCCCATGGAAACTTACAAGGCTCAAAGA31                                             (2) INFORMATION FOR SEQ ID NO: 36:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 60 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36:                                     TCAGCTTAAGACTTCGAAATTAATACGACTCACTATAGGGAGACCACAACGGTTTCCCTC60                (2) INFORMATION FOR SEQ ID NO: 37:                                             (i) SEQUENCE CHARACTERISTICS:                                                (A) LENGTH: 62 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37:                                     TCGACCATGGGTATATCTCCTTCTTAAAGTTAAACAAAATTATTTCTAGAGGGAAACCGT60                TG 62                                                                         (2) INFORMATION FOR SEQ ID NO: 38:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 42 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38:                                     TCAGTCCGG ATAAACAAAATTATTTCTAGAGGGAAACCGTTG42                                 (2) INFORMATION FOR SEQ ID NO: 39:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 25 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39:                                     TCCGG ACTTTAAGAAGGAGATATAC25                                                  (2) INFORMATION FOR SEQ ID NO: 40:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 31 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40:                                     A ATAGTCTAGCCATCAGAAAGTCTCCTGTGC31                                            (2) INFORMATION FOR SEQ ID NO: 41:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 28 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41:                                     AGACTTTCTGATGGCTAGACTATTTCTT28                                                (2) INFORMATION FOR SEQ ID NO: 42:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 29 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42:                                     CTGAATCAGGAGACCCGGGGTCTTTGGTC29                                               (2) INFORMATION FOR SEQ ID NO: 43:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 31 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (x i) SEQUENCE DESCRIPTION: SEQ ID NO: 43:                                    CGGGGATCCAAAAGCCATTGACTCAGCAAGG31                                             (2) INFORMATION FOR SEQ ID NO: 44:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 30 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                         (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44:                                    GGGGGTCGACGCATGCGAGGAAAATAGACG30                                              (2) INFORMATION FOR SEQ ID NO: 45:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 25 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45:                                     GGCGTACCTTTGTCTCACGGGCAAC25                                                   (2) INFORMATION FOR SEQ ID NO: 46:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 74 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: Other Nucleic Acid                                        (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46:                                     CTTTAAGAAGGAGATATACATATGGCTAGCATGACTGGTGGACAGCAAATGCATGCACAG60                GAGACTTTCTGATG74                                                          

We claim:
 1. A recombinant DNA sequence that encodes amino acids number140 to 893 of SEQ ID NO:1.
 2. The DNA sequence of claim 1 that isnucleotides number 418 to 2682 of SEQ ID NO:
 1. 3. A recombinant DNAsequence that encodes amino acids number 284 to 893 of SEQ ID NO:1. 4.The DNA sequence of claim 3 that is nucleotides number 850 to 2682 ofSEQ ID NO:
 1. 5. A truncated Thermotoga maritima DNA polymerase genethat encodes a thermostable Thermotoga maritima DNA polymerase thatcatalyzes the combination of nucleoside triphosphates to form a nucleicacid strand complementary to a nucleic acid template strand wherein saidpolymerase has the following additional characteristics:(1) it comprisesa 3'→5' exonuclease activity; (2) it has an optimal polymerizationactivity between 65° C. and 75° C.; (3) it has a molecular weight ofabout 86 kDa; and (4) it does not have a 5'→3' exonuclease activity. 6.The recombinant DNA sequence of claim 5 wherein said polymerizationactivity is catalyzed by a polypeptide domain consisting of the sequenceamino acid sequence 485-893 of SEQ ID No.
 1. 7. The recombinant DNAsequence of claim 5 wherein said 3'→5' exonuclease activity is catalyzedby a polypeptide domain consisting of the sequence amino acid sequence291-484 of Seq ID No.
 1. 8. A truncated Thermotoga maritima DNApolymerase gene that encodes a thermostable Thermotoga maritima DNApolymerase that catalyzes the combination of nucleoside triphosphates toform a nucleic acid strand complementary to a nucleic acid templatestrand wherein said polymerase has the following additionalcharacteristics:(1) it has a 3'→5' exonuclease activity; (2) it has anoptimal polymerization activity between 65° C. and 75° C.; (3) it has amolecular weight of about 70 kDa; (4) it has a half life at 97.5° C. ofabout 50 minutes; and (5) it does not have a 5'→3' exonuclease activity.9. The recombinant DNA sequence of claim 8 wherein said polymerizationactivity is catalyzed by a polypeptide domain consisting of the sequenceamino acid sequence 485-893 of SEQ ID NO.
 1. 10. A truncatedthermostable Thermotoga maritima DNA polymerase that catalyzes thecombination of nucleoside triphosphates to form a nucleic acid strandcomplementary to a nucleic acid template strand wherein said polymerasehas the following additional characteristics:(1) it comprises a 3'→5'exonuclease activity; (2) it has an optimal polymerization activitybetween 65° C. and 75° C.; (3) it has a molecular weight of about 86kDa; and (4) it does not have a 5'→3' exonuclease activity.
 11. The DNApolymerase of claim 10 wherein said 3'→5' exonuclease activity iscatalyzed by a polypeptide domain consisting of the sequence amino acidsequence 291-484 of SEQ ID No.
 1. 12. The DNA polymerase of claim 10wherein said polymerization activity is catalyzed by a polypeptidedomain consisting of the sequence amino acid sequence 485-893 of SEQ IDNo.
 1. 13. A truncated thermostable Thermotoga maritima DNA polymerasethat catalyzes the combination of nucleoside triphosphates to form anucleic acid strand complementary to a nucleic acid template strandwherein said polymerase has the following additional characteristics:(1)it has a 3'→5' exonuclease activity; (2) it has an optimalpolymerization activity between 65° C. and 75° C.; (3) it has amolecular weight of about 70 kDa; and (4) it has a half life at 97.5° C.of about 50 minutes; and (5) it does not have a 5'→3' exonucleaseactivity.
 14. The DNA polymerase of claim 13 wherein said polymerizationactivity is catalyzed by a polypeptide domain consisting of the sequenceamino acid sequence 485-893 of SEQ ID No.
 1. 15. A purified chimeric,non Thermotoga maritima thermostable DNA polymerase that catalyzes thecombination of nucleoside triphosphates to form a nucleic acid strandcomplementary to a nucleic acid template strand wherein said polymerasecomprise: a 3'→5' exonuclease activity and wherein said 3'→5'exonuclease activity is catalyzed by a polypeptide domain consisting ofthe amino acid sequence 291-484 of SEQ ID NO
 1. 16. The enzyme encodedby the DNA sequence of claim
 1. 17. The enzyme encoded by the DNAsequence of claim 3.